ProteoNexus: an integrative database to characterize genetic architecture, estimate mediation effects, and construct and evaluate prediction models of the plasma proteome
Nucleic Acids Research, 2025
Shao K., Luo Z., Huang P., Yang S.
| Disease area | Application area | Sample type | Products |
|---|---|---|---|
Wider Proteomics Studies | Data Science | Plasma | Olink Explore 3072/384 |
Abstract
Proteins are biological effectors that mediate the effects of exposures on diseases and serve as predictors for constructing high-performance disease prediction models. However, an integrative, sex-specific proteomic resource using a biobank-scale dataset remains unavailable. Here, we introduce ProteoNexus, a database featuring a standardized best-practice pipeline integrating protein pQTLs mapping, mediation analysis, and risk prediction. Following stringent quality control, ProteoNexus comprises three categories of exposures: 129 measurement-based variables, 54 environmental variables, 1 251 123 single-nucleotide polymorphisms (SNPs), and 57 incident diseases among 33 325 European participants. ProteoNexus identifies 16 998 putative causal pQTLs, of which 5 979 are cis-pQTLs and 11 019 are trans-pQTLs in the combined-sex dataset, while 9 464 and 7 832 pQTLs were identified in the female and male datasets, respectively. Using a two-step screening strategy, ProteoNexus identifies 308 325, 144 975, and 1 336 significant pathways caused by measurement-based variables, environmental variables, and SNPs, respectively, followed by enrichment analysis of proteins associated with these exposures. With 21 optimized parameters for four machine learning algorithms, ProteoNexus provides an online analysis module that enables users to analyze their own proteomic data. Users can search for results by protein, reported disease, ICD-10 code, or exposure, with accompanying summary statistics for each query. ProteoNexus is freely accessible at https://www.proteonexus.com/.