Olink

Olink®
Part of Thermo Fisher Scientific

Leveraging population-scale proteomic data with deep learning for head and neck cancer detection in saliva

npj Digital Medicine, 2026

Shakeel A., Merriel S., Smith J., McGough A., Suderman M., Abdallah Z., Yousefi P.

Disease areaApplication areaSample typeProducts
Oncology
Wider Bioinformatics Studies
Technical Evaluation
Data Science
Plasma
Saliva
Olink Target 96

Olink Target 96

Olink Explore 3072/384

Olink Explore 3072/384

Abstract

Identifying robust biomarkers for early cancer detection remains challenging, particularly when working with limited or heterogeneous datasets. Here, we present a proof-of-concept deep learning framework for cancer classification using blood-based proteomic profiles. Our approach leverages sample type transfer and synthetic data augmentation to improve performance and generalization across sample types. Models were trained on plasma proteome data from 13,208 pan-cancer cases and 39,806 controls in the UK Biobank. To address class imbalance and enrich the feature space, a convolutional neural network (CNN-Synth) was trained to detect cancer cases using data augmented with synthetic pan-cancer samples generated via a variational autoencoder. Performance was evaluated in an independent saliva-based dataset from a head and neck cancer case-control study (n = 156). CNN-Synth (AUC = 0.88) surpassed models trained without synthetic data (AUC ≤ 0.77). SHapley Additive explanations identified well-known cancer markers as key features. These results highlight the use of sample type transfer and synthetic data augmentation, with further validation needed.

Read publication ↗