The proteome as a net with which to catch environmental influences on phenotype
The gap between genotype and phenotype
After a brief introduction to Olink Proteomics by David Harley, Dr. Stefansson began his presentation with a photograph of an Icelandic manuscript in Old Norse, called the Stjórn dating back to the 14th century. This image was from the book of Genesis, with a depiction of God as King on a throne, complete with angels on one side and demons and fallen angels on the other. (This image was used repeatedly through the presentation.)
He began with the statement, “The key to the understanding of human disease lies in the study of the biologic foundation of human diversity”.
He used a list of DNA bases on left side as the ‘fixed’ genotypes we all have at birth, while on the right side there was an image of Da Vinci’s Vitruvian Man, as shorthand for our own existence throughout life, exposed to environmental influences (diet, education, exercise, smoking) and our own health and disease status throughout our lifetime. Above both panels were question marks and arrows pointing to each panel.
Dr. Stefansson pointed out how we are accustomed to the idea of human variation at birth, and the older we get, the more influence our environment has on each of us. He illustrated this point with a photograph of a newborn grandchild of his, only a few weeks old, contrasted with a photograph of himself (in his early 70’s).
He then ran through all the intermediate measurements from genotypes and genomes to clinical phenotypes: tying genotypes plus environmental factors to RNA levels and expression Quantitative Trait Loci (eQTLs), and now genotypes plus environmental factors to circulating protein levels. The speaker said most proteins, whether in the nuclei or in the cytoplasm, end up in the blood, and the level of each of them exists in some kind of dynamic equilibrium. Thus, proteomics helps to bridge this gap between genome and phenotype: there is a temporal quality to proteins as they rise and fall as a function of an event in the body. By adding variants and RNA measurements on top of protein measurements, the causes versus the consequences of disease can be separated.
A history of genomics goes through linkage analyses
The presenter claimed that the proteome can be examined in a hypothesis-free manner, and used the history of genetic analysis as an example. Before the availability of single nucleotide polymorphism (SNP) microarrays, geneticists used linkage analysis as the only available method at that time. “It was a weak method, with candidate genes, that could lead us astray” Dr. Stefansson declared. And when SNP microarrays became available genome-wide, the ensuing Genome Wide Association Studies (GWAS) led to an avalanche of discovery.
deCODE Genetics’ list of complex traits that have been associated with common variants include major diseases like Type 2 Diabetes, Atrial Fibrillation, and Lung Cancer; conditions such as Osteoarthritis, Restless Leg Syndrome, and Asthma; and characteristic behaviors such as smell, coffee consumption and even love of crossword puzzles.
The importance of going hypothesis-free
In the middle of his presentation Dr. Stefansson presented his method of interrogating the proteome, using Olink Explore 3072 with a laboratory throughput of 1,408 samples per week. Returning to his findings, he presented data from Olink Explore 1536 (measuring 1472 proteins) where plasma protein levels were correlated to age and Body Mass Index (BMI) of individuals: 64% of proteins increase in level with age; 70% increase in level with increasing BMI. Only 5% do not change either with increasing age or BMI.
Next he discussed a few examples where sequence variants associated with both disease and levels of circulating proteins. These protein level associations of disease “helps understand the biological effect of the variant”.
His first example was for a variant at the CHRDL2 locus that has been known for some time to associate with colorectal cancer (Odds Ratio of 1.32, p-value of 1e-20), although there is no correlation of this locus to an RNA transcript level as an eQTL. However, the colorectal cancer risk allele correlates to the level of the protein CHRDL2; this protein is a BMP agonist thus interfering interaction with BMP receptors. It has been known that loss-of-function mutations in one of the BMP receptors leads to juvenile polyposis coli, a precancerous condition in children.
Next he gave a non-associating example, where a protein called beta-defensin 4A (DEFB4A) had higher levels in cases of psoriasis, although variants in the BEFB4A gene did not associate with psoriasis risk; thus the elevated levels of this protein were a consequence of psoriasis rather than the cause of it.
Other examples where variation and measurement of circulating protein levels led to deeper understanding of cause of disease included a complex association between psoriasis and Crohn’s Disease, osteoarthritis and a gene called CRTAC1, and AutoImmune Thyroid Disease involving a risk gene called FLT3 normally known as a driver gene for Acute Myeloid Leukemia (AML).
A novel mortality prediction model
In the last section of his presentation, Dr. Stefansson presented data from a cohort of 23,154 individuals followed up for 13 years, with 7,380 deaths in that timeframe. They derived from the protein-level information a kind of “death calculator”: the top 5% of individuals in the cohort had an 88% risk of death in the following 10 years; while the lowest 5% of individuals in the cohort with essentially zero risk of death. This was a calculation based upon measurement of 100 proteins, plus the phenotypes of age and gender. He compared three mortality prediction models: one with only age and sex (survival probability and time in years on the Y and X axes respectively); one with age and sex and GDF15 (GDF15 was the most significantly up-regulated protein in the collection of proteins associated with risk of death); and a third with age, sex and 100 protein measurements.
By age and sex alone, the top 5% had a 10-year risk of death at 56%; using age, sex + GDF15 the top 5% had a 10-year risk of death of 74%; lastly with age, sex and 100 protein measurements the 10-year risk rises to the aforementioned 88%.
Spectacular resource
Dr. Stefansson concluded his presentation with a fascinating topic: first walking through the calculation of a polygenic risk score (PRS) for cardiovascular disease, as well as a proteomics risk score for the same disease (and the same samples). For calculating the genetic risk score, the numbers were huge: 224,000 cases and 1.1 million controls, across 5 population cohorts including UK Biobank and FINNGEN; for the proteomics risk score, 9,500 Icelanders with proteomics measurements and 1,000 coronary-related events (myocardial infarction, stroke, coronary heart disease death).
Per this analysis, the polygenic risk score and the protein score were “essentially uncorrelated”. These two scores contain “complementary information for predicting risk of cardiovascular events”. (Specifically, the correlation was 0.028 with a p-value of 0.015.)
At the end of his presentation, he pointed to the UK Biobank’s Pharma Proteomics Project, a consortium lead by Dr. Christopher Whelan from the pharmaceutical company Biogen. They measured 54,306 individuals with the Olink Explore 3072, with the first tranche of data delivered from the Olink Explore 1536 in January of 2022, and the dataset to be released in October of 2022. He called the UK Biobank a “spectacular resource”. He finished by saying “industry has contributed to whole exome sequencing of the UKB, whole genome sequencing of the same and is now contributing to proteomics”. “This allows for association between protein levels and genotypes or phenotypes.”