Machine learning-based clinical prediction model and multi-omics integration for assessing pancreatic cancer risk in new-onset diabetes
Journal of Translational Medicine, 2026
Yang J., Cao B., Yuemaierabola A., Gong Y., Guo Y., Zhong H., Zhang K., Wang S., Huang Q., Li J., Ye T., Luo J., Zhou Y., Chen R.
| Disease area | Application area | Sample type | Products |
|---|---|---|---|
Oncology Metabolic Diseases | Pathophysiology | Plasma | Olink Explore 3072/384 |
Abstract
Background
Given that pancreatic cancer (PC) is typically diagnosed at an advanced stage but is often preceded by new-onset diabetes mellitus (NODM), providing a window for early detection, we sought to develop and validate an interpretable machine-learning model integrated with multi-omics profiling to identify early biomarkers of NODM-associated PC.
Methods
In a population-based cohort, individuals with NODM-associated PC and NODM without PC were identified and randomly divided (70:30) into training and validation sets after feature selection. Eight machine learning (ML) classifiers were compared using fivefold cross-validation, and model performance was evaluated in terms of discrimination, calibration, and decision curve–based clinical utility. We evaluated interpretability using the Shapley additive explanations (SHAP) analyses. Mechanistically, Olink proteomic profiling and metabolomics were analyzed through clinical classifications and model-defined risk strata.
Results
Categorical boosting achieved the best performance in the independent validation set (AUROC = 0.844). The NODM cohort was stratified into high- (n = 2,362) and low-risk (n = 5,030) groups, and internal validation together with SHAP analyses demonstrated consistent model performance and identified clinically interpretable predictors. Proteomic and metabolomic analyses under clinical and risk-based grouping identified 39 overlapping differentially expressed proteins and 145 overlapping metabolites with enriched across 11 shared KEGG pathways. Cross-platform validation highlighted PLTP, CRTAC1, and ITGAV as serum biomarkers with a strong potential for early NODM-PC detection.
Conclusions
We developed an interpretable ML framework centered on NODM enables practical risk stratification for early PC detection by multi-omics and provides a pathway of ML-based triage followed by biomarker confirmation for earlier detection and diagnosis.