Background
In the UK Biobank Pharma Proteomics Project (UKB-PPP), 13 biopharmaceutical companies generated new proteomic data from accessing the UK Biobank. Using the Olink® Explore platform, they measured around 3,000 proteins covering all major biological pathways in more than 54,000 UKB participant samples.
A Nature article from the UKB-PPP consortium provides a first detailed summary of the data obtained from a GWAS-based proteogenomic analysis and protein quantitative trait loci (pQTL) mapping, identifying over 14,000 primarily novel genetic associations with protein expression levels. These findings were analyzed in the context of several different biologies and diseases, illustrating the unprecedented breadth and depth of this dataset to help elucidate biological mechanisms, identify actionable new biomarkers and accelerate drug development.
Outcome
Overall, the analysis identified 14,287 pQTLs, 81% of which are novel genetic associations with protein levels that have not been previously reported. Among the ~3,000 proteins measured, ~83% had at least one pQTL and ~67% had cis associations, where the genetic variant was located in or close to the gene encoding the protein being measured. These cis-pQTLs can be used in Mendelian Randomization analyses to assess causal associations with phenotypes, which can be invaluable in the identification of new drug targets. They also offer genetic corroboration that the correct protein is being measured.
Analysis of trans-pQTLs, where the genetic variant is located at a significant distance from the protein-encoding gene, highlighted their value in identifying biological pathways and protein-protein interactions, with interacting partners found for >860 trans loci. The utility of these data for drug target discovery was demonstrated in several deep-dive analyses, including effects of ABO blood group on gastrointestinal protein expression that may be perturbed in GI diseases and disentanglement of shared and distinct protein pathways associated with COVID-19 susceptibility loci.
More broadly, this study showed how larger sample sizes increase the power and availability of genetic instruments for Mendelian randomization in casual inference, mimicking drug target effects observed in clinical trials. This open-access, population-scale proteomics resource for the wider scientific community is likely to generate a plethora of important new discoveries in the months and years to come.