Plasma proteomic and machine learning models for differentiating idiopathic pulmonary fibrosis and connective tissue disease–associated interstitial lung disease: findings from a prospective cohort
Respiratory Research, 2026
Wu C., Lai D., Chen Y., Yu Y., Yang L., Fu P.
| Disease area | Application area | Sample type | Products |
|---|---|---|---|
Respiratory Diseases | Patient Stratification | Plasma | Olink Target 96 |
Abstract
Background
Idiopathic pulmonary fibrosis (IPF) is a progressive fibrotic interstitial lung disease (ILD) with limited treatment options and poor prognosis. Differentiating IPF from connective tissue disease–associated ILD (CTD-ILD) is clinically challenging due to overlapping features, and reliable circulating biomarkers are lacking. Recent studies suggest that multi-marker proteomic models combined with machine learning may enhance diagnostic precision and prognostic assessment in fibrotic ILDs.
Methods
We prospectively analyzed plasma samples from Taiwanese patients with fibrotic ILDs (IPF, n = 22; CTD-ILD, n = 66) using the Olink inflammation panel (92 proteins). Differentially expressed proteins were identified and subjected to integrative network analyses. Predictive classification models were developed using generalized linear modeling (GLM), decision tree, and random forest approaches. Prognostic relevance was evaluated with Kaplan–Meier and Cox regression analyses, and findings were validated in public transcriptomic datasets.
Results
Among 92 proteins profiled, 23 showed significant differences between IPF and CTD-ILD. Four candidates—MMP-10, FGF-19, ADA, and TWEAK (TNFSF12)—consistently emerged as key discriminatory markers. The GLM model incorporating FGF-19, ADA, and TWEAK achieved the highest diagnostic accuracy (AUC 0.870; sensitivity 0.97; specificity 0.82), outperforming decision tree and random forest models. Transcriptomic validation confirmed TWEAK downregulation in ILD lung tissues and in TGF-β1–stimulated fibroblasts, linking it to canonical profibrotic signaling. Survival analysis showed significantly worse outcomes in IPF versus CTD-ILD (log-rank p < 0.001), with MMP-10 associated with poor prognosis (HR 2.08, p = 0.007) and TWEAK with favorable prognosis (HR 0.04, p < 0.001).ConclusionsThis study identifies distinct plasma proteomic signatures that differentiate IPF from CTD-ILD and highlights TWEAK as both a diagnostic and prognostic biomarker. A multi-marker GLM model demonstrated excellent diagnostic performance, supporting the clinical utility of plasma proteomics combined with machine learning to improve disease classification and risk stratification in fibrotic ILDs.