Background
A comprehensive characterization of blood proteome profiles in cancer patients could provide a better understanding of disease biology, enabling earlier diagnosis, risk stratification and better monitoring of the different cancer subtypes. Professor Mathias Uhlén’s team from the Royal Institute of Technology (KTH) in Stockholm used Olink® Explore 1536 to measure 1436 plasma proteins in over 1400 patients presenting with 12 different types of common cancers, using samples taken at the time of diagnosis, before the initiation of treatment. Using machine learning methods, the differentially expressed proteins identified were used to derive models to discriminate among different cancer types, with quite remarkable results.
Outcome
Differentially expressed proteins were identified for each of the 12 cancer types (two forms of leukemia, lymphoma, myeloma, colorectal cancer, lung cancer, glioma, breast cancer, cervical cancer, endometrial cancer, ovarian cancer and prostate cancer). The most significant discriminatory proteins for each cancer type were identified as PRDX5, CEACAM5, PRTG, GLO1, DNER, PLAT, GFAP, CXCL9, CD244, PAEP, TCL1A, and CNTN5. Separate diagnostic models for each cancer type were then derived using the machine learning algorithm glmnet, with 70% of samples used as a training set and all other cancer types used as the control for each specific cancer. The number of proteins contributing to the classification of each cancer type varied considerably, from 473 for colorectal to just 9 for myeloma.
The classification models were then evaluated in the remaining 30% of the data excluded from the model training. As shown in the figure below in this post (taken from the original article), the performances of these models in distinguishing each cancer from all the other types were very high, with AUCs ranging from 0.82 to 1 (0.95 or above for six of the twelve cancer types). Reducing the number of proteins used to derive the models (especially when using fewer than the top 50 as model input) affected the AUCs significantly in most cases, demonstrating the value of including many proteins in the classification model to gain higher confidence.
Further data analysis identified a single panel of 83 proteins designed to measure all 12 cancer types (based on the highest contributing proteins from the individual models). This pan-cancer panel identified the correct cancer types with AUCs ranging between 0.93 and 1, with performances only marginally inferior to the initial models based on all ~1400 proteins. Preliminary analysis also indicated that the protein panel was able to discriminate all cancers from healthy controls and showed promising performance in both staging some of the cancer types, and in detecting very early-stage cancer. Further investigations in larger cohorts will be needed to confirm these potentially important findings.