Machine learning model based on plasma proteomics for the identification of Parkinson’s disease

Brain, 2026

Adewale B., Chia R., Moaddel R., Landeck N., Rasheed M., Alba C., Reho P., Vasta R., Calvo A., Moglia C., Canosa A., Manera U., Snyder A., Lee Y., Grassano M., Gao C., Zhu M., Brunetti M., Casale F., Arvind K., Soltis A., Viollet C., Sukumar G., Alba C., Lott N., Martinez E., Tuck M., Singh J., Bacikova D., Zhang X., Hupalo D., Adeleye A., Wilkerson M., Pollard H., Dalgard C., Dawson T., Rosenthal L., Hall A., Pantelyat A., Ding J., Gibbs J., Egan J., Candia J., Tanaka T., Ferrucci L., Chiò A., Narendra D., Kwan J., Ehrlich D., Dalgard C., Traynor B., Scholz S.

Disease area	Application area	Sample type	Products
Neurology	Pathophysiology Patient Stratification	Plasma	Olink Explore 3072/384

Abstract

Developing reliable biomarkers capable of differentiating Parkinson’s disease from other neurological conditions is crucial for both patient care and research. In this study, we leveraged recent advances in high-throughput proteomic technology and machine learning to develop candidate biomarkers for Parkinson’s disease.

Using the Olink Explore 3072 assay, we obtained plasma proteomic profiles from 698 study participants, comprising Parkinson’s disease cases (n = 149), neurologically healthy controls (n = 230), and participants with other neurological conditions (n = 319). The study cohort was split into Training Set (n = 560) and Test Set (n = 138). We conducted differential protein abundance analysis and pathway enrichment analysis, and subsequently applied the Boruta algorithm to identify differentially abundant proteins that are predictive of Parkinson’s disease. To create a diagnostic biomarker panel, we trained a stacking ensemble machine learning (ML) model on the Training Set (n = 118 Parkinson’s patients, n = 184 healthy controls, and n = 258 individuals with other neurological disorders) using eleven proteins (APOH, ARG1, CCN1, CXCL1, CXCL8, DDC, GRAP2, IL1RAP, OSM, PRL, and SPRY2) as model features. We used the Shapley Additive Explanations (SHAP) framework and network analysis to evaluate predictive importance and biological relevance of each protein in the ML model.

The model demonstrated high accuracy in the held-out Test Set (n = 138) and three external cohorts–the UK Biobank (n = 43,969), the Parkinson’s Disease Biomarkers Program (n = 138), and the Parkinson’s Progression Markers Initiative (n = 385), with areas under the receiver operating characteristic curve of 0.939, 0.789, 0.909, 0.816, respectively. Additionally, network and pathway analyses helped interpret the model, revealing activity related to inflammatory mediators, ErbB signaling, T-cell receptor signaling, and lipid metabolism. Our findings highlight the potential of plasma protein biomarkers to improve Parkinson’s disease diagnosis and deepen biological understanding of this complex neurological disorder.

Our model demonstrates high specificity and reliability across multiple independent cohorts, indicating the significant potential of proteomics-based biomarkers and the clinical utility of ML-supported diagnosis in Parkinson’s disease care. The model also helps to elucidate potential novel risk factors and pathways associated with Parkinson’s disease.

Read publication ↗

See all publications →

Products

Instrument

Software

Olink Reveal

Technology

Proximity Extension Assay

Application

Community

Population-scale proteogenomics

Services

service providers

Olink worldwide network of service providers

Knowledge

Publications

About

Our Legal Center

Machine learning model based on plasma proteomics for the identification of Parkinson’s disease