Quantitative Proteomics: Aptamer-Based Protein Quantification

5 minute read

Published:

Quantitative proteomics is a cutting-edge approach for measuring protein levels in complex biological samples. One innovative method in this field is aptamer-based protein quantification. Aptamers, which are short, single-stranded DNA or RNA molecules, are engineered to specifically bind to target proteins with high precision.

The SomaScan® assay is a widely used aptamer-based method for measuring protein abundances. However, there is limited information on how SomaScan correlates with mass spectrometry (MS)-based proteomics and Olink assays, another high-throughput antibody-based platform. Some studies have noted measurement variations between these platforms. Aptamers or SOMAmers in the SomaScan assay are typically selected to bind target proteins in their native conformation or with known post-translational modifications (PTMs). However, novel PTMs induced by pathogens or diseases can alter protein structure, electrophilicity, and interactions. A key drawback of the SomaScan assay is that its quantification relies on DNA microarray chips, which can introduce background noise. On the other hand, the advantages include lower cost and efficient data analysis.

Table 1. Overview of common proteomic platforms (Jiang W et al, Cancers, 2022).

Analytical TechniqueCategoryProtein Sample ValuesAccepted Biospecimen Types
Proximity Extension Assay (Olink)Antibody1 µLPlasma, tissue/cell, synovial fluid, CSF, plaque extract, and saliva
Reverse Phase Protein ArraysAntibody5 µg (1.0- 1.5 mg/mL protein)Tissue/cell, plasma, serum, biopsies, body fluids
Bio-PlexAntibody (bead)12.5 µL (serum/plasma)50 µL (cell culture) Plasma, serum, tissue/cell
SimoaAntibody (bead)25 µLPlasma, serum, urine, tissue/cell, CSF, saliva
Aptamer Group (Optmer)Aptamer38 µLPlasma (diagnostics and therapeutics), urine, tissue/cell, liquid matrices
Base Pair TechnologiesAptamer5–100 µLPlasma, serum, tissue/cell
SOMAscanAptamer55–100 µLPlasma, serum, CSF, urine, cell/tissue, synovial fluid, exosomes
Electrochemiluminescence ImmunoassayECLIA50 µLPlasma, serum, tissue/cell, CSF, urine, blood spots, tears, synovial fluid, tissue extracts
Multiplex ELISAELISA25–50 µLPlasma, serum, tissue/cell, urine, saliva, CSF
Singleplex ELISAELISA100 µLPlasma, serum, tissue/cell, urine, saliva, CSF
2D-PAGEGel electrophoresis~100 µg (15–50 µL)Plasma, serum, tissue/cell, urine
DDA-MSMS10 µLPlasma, serum, tissue/cell
SWATH-MSMS (DIA)5–10 µgPlasma, serum tissue/cell, platelets, monocytes/neutrophils
iTRAQMS (labeling in LC–MS–MS)12 µgPlasma, serum, tissue/cells, saliva
SRM/MRMMS (LC–MS–MS)15 µLPlasma, tissue/cell, dried blood spots

The SomaScan Assay v4.1 enables the simultaneous measurement of approximately 6,600 unique human proteins in a single sample (see Table 2). This is achieved using around 7,300 aptamers known as SOMAmers (Slow Off-rate Modified Aptamers). SOMAmers are short, chemically modified single-stranded DNA molecules designed to specifically bind to protein targets. The assay quantifies native proteins in complex samples by converting each protein’s concentration into the corresponding SOMAmer reagent concentration, which is then measured using DNA microarrays. SOMAmers are selected to bind proteins in their native, folded states, typically requiring an intact tertiary protein structure for effective binding.

Table 2. The 7k SomaScan Assay v4.1 panel (7,596 aptamers mapping to 6,414 unique human protein targets).

OrganismSOMAmersUniProt IDs (all)UniProt IDs (Unique)Protein TargetsGene IDsGene Symbols
Human733573016414661064086398
Mouse2362364433
African clawed frog331211
Gila monster331100
Hornet331101
Jellyfish331101
Thermus thermophilus331101
Common eastern firefly221100
Bacillus stearothermophilus111101
Ensifer meliloti111101
European elder221100
HIV-1111101
HIV-2111111
Red alga111111
strain K12111111
Total759675626431662864156411

SomaDataIO

SomaDataIO v5.3.1 is an R package for working with the SomaLogic ADAT file format. ADAT is a tab-delimited text file format that contains various data elements, including SOMAmer reagent intensities, sample data, sequence information, and experimental metadata. For each SOMAmer reagent sequence, the ADAT file usually includes the corresponding protein name, UniProt ID, Entrez Gene ID, and Entrez Gene symbol.

library(SomaDataIO)
library(purrr)
library(tidyr)
library(dplyr)
library(ggplot2)  

The read_adat() function imports data from ADAT files.

base.dir = "/Users/adinasa/Documents/"
adat_file <- "example.adat"
my_adat <- read_adat(paste0(base.dir,adat_file))  

Update the ADAT file with sample group information (adding sample group details from external file). Save the updated ADAT file (optional).

meta_file = paste(base.dir, "MAPPING.csv", sep="/")  
meta <- read.csv(meta_file, header = T, stringsAsFactors = FALSE)  
meta$SampleId <- as.character(meta$SampleId)  
my_adat <- dplyr::left_join(my_adat,meta, by="SampleId", keep=FALSE)
write_adat(my_adat, file = paste(base.dir, "example_updated.adat", sep="/"))  

Utility functions.

regex for analytes

is_seq <- function(.x) grepl("^seq\\.[0-9]{4}", .x)  

center/scale vector (z-scores).

cs <- function(.x) {      out <- .x - mean(.x)  
  out / sd(out)       
}

Data were log2-transformed within each sample. Control and Disease are two groups (this may vary in your data).

cleanData <- my_adat %>% 
  filter(SampleType == "Sample") %>% drop_na(Group) %>% 
  log2() %>% 
  mutate(SampleGroup = as.numeric(factor(Group, levels=c("Control", " Disease"))) - 1) %>% 
  modify_if(is_seq(names(.)), cs)

Human proteins with Uniprot ID and QC=PASS selected.

t_tests <- getAnalyteInfo(cleanData2) %>% 
  filter(ColCheck == "PASS") %>% 
  filter(Organism == "Human") %>%
  filter(UniProt != "") %>%
  select(AptName, SeqId, Target = TargetFullName,Organism, EntrezGeneID, EntrezGeneSymbol, UniProt, ColCheck)

Performed a Student’s t-test.

t_tests <- t_tests %>% 
  mutate(
    formula = map(AptName, ~ as.formula(paste(.x, "~ SampleGroup"))), 
    t_test  = map(formula, ~ stats::t.test(.x, data = cleanData,var.equal = TRUE)),  
    t_stat  = map_dbl(t_test, "statistic"),            
    p.value = map_dbl(t_test, "p.value"),              
    fdr     = p.adjust(p.value, method = "BH")         
  ) %>% arrange(p.value)

The results were used to identify proteins significantly associated with disease using a Benjamini-Hochberg false discovery rate (FDR) threshold of 1%.

t_tests [t_tests$fdr <= 0.1,]

Further reading …
Tandem Mass Tag (TMT)-based quantitation of proteins
Label-Free Quantitation (LFQ) of proteins