Quantitative proteomics: aptamer-based quantitation of proteins

5 minute read

Published:

The aptamer-based SomaScan® assay is one of the popular methods of measuring abundances of protein targets. There is very little information on correlation between mass spectrometry (MS)-based proteomics, SOMAscan and Olink assays; Olink is another popular high throughput antibody-based platform. Some studies also reported a measurement variation between those platforms. In general aptamers/SOMAmers are selected against target proteins in their native conformation and in some cases against a functional protein with “known” post translational modifications (PTMs). It’s well known that novel PTMs (pathogen or disease-induced) can impact the protein structure, electrophilicity and interactions with proteins. The other main disadvantage is quantification which is based on DNA microarray chips (background noise). The main advantages are lower cost and data analysis.

Table 1. Overview of common proteomic platforms (Jiang W et al, Cancers, 2022).

Analytical TechniqueCategoryProtein Sample ValuesAccepted Biospecimen Types
Proximity Extension Assay (Olink)Antibody1 µLPlasma, tissue/cell, synovial fluid, CSF, plaque extract, and saliva
Reverse Phase Protein ArraysAntibody5 µg (1.0- 1.5 mg/mL protein)Tissue/cell, plasma, serum, biopsies, body fluids
Bio-PlexAntibody (bead)12.5 µL (serum/plasma)50 µL (cell culture) Plasma, serum, tissue/cell
SimoaAntibody (bead)25 µLPlasma, serum, urine, tissue/cell, CSF, saliva
Aptamer Group (Optmer)Aptamer38 µLPlasma (diagnostics and therapeutics), urine, tissue/cell, liquid matrices
Base Pair TechnologiesAptamer5–100 µLPlasma, serum, tissue/cell
SOMAscanAptamer55–100 µLPlasma, serum, CSF, urine, cell/tissue, synovial fluid, exosomes
Electrochemiluminescence ImmunoassayECLIA50 µLPlasma, serum, tissue/cell, CSF, urine, blood spots, tears, synovial fluid, tissue extracts
Multiplex ELISAELISA25–50 µLPlasma, serum, tissue/cell, urine, saliva, CSF
Singleplex ELISAELISA100 µLPlasma, serum, tissue/cell, urine, saliva, CSF
2D-PAGEGel electrophoresis~100 µg (15–50 µL)Plasma, serum, tissue/cell, urine
DDA-MSMS10 µLPlasma, serum, tissue/cell
SWATH-MSMS (DIA)5–10 µgPlasma, serum tissue/cell, platelets, monocytes/neutrophils
iTRAQMS (labeling in LC–MS–MS)12 µgPlasma, serum, tissue/cells, saliva
SRM/MRMMS (LC–MS–MS)15 µLPlasma, tissue/cell, dried blood spots

The SomaScan Assay v4.1 measures simultaneously ~6,600 unique human proteins in a single sample (see Table 2). Those protein targets were evaluated by ~7,300 aptamers called SOMAmers (Slow Off-rate Modified Aptamers). SOMAmers are short single-stranded DNA molecules, which are chemically modified to specifically bind to protein targets. The SOMAscan assay measures native proteins in complex matrices by transforming each individual protein concentration into a corresponding SOMAmer reagent concentration, which is then quantified using DNA microarrays. SOMAmer reagents are selected against proteins in their native folded conformations and are therefore generally found to require an intact, tertiary protein structure for binding.

Table 2. The 7k SomaScan Assay v4.1 panel (7,596 aptamers mapping to 6,414 unique human protein targets).

OrganismSOMAmersUniProt IDs (all)UniProt IDs (Unique)Protein TargetsGene IDsGene Symbols
Human733573016414661064086398
Mouse2362364433
African clawed frog331211
Gila monster331100
Hornet331101
Jellyfish331101
Thermus thermophilus331101
Common eastern firefly221100
Bacillus stearothermophilus111101
Ensifer meliloti111101
European elder221100
HIV-1111101
HIV-2111111
Red alga111111
strain K12111111
Total759675626431662864156411

ADAT file

ADAT is a tab-delimited text file format. The contents include SOMAmer reagent intensities, sample data, sequence data and experimental metadata. For each SOMAmer reagent sequence, the ADAT file typically contains corresponding protein name, UniProt ID, Entrez Gene ID and Entrez Gene symbol.

SomaDataIO

SomaDataIO v5.3.1 is an R package for working with the SomaLogic ADAT file format.

library(SomaDataIO)
library(purrr)
library(tidyr)
library(dplyr)
library(ggplot2)  

The read_adat() function imports data from ADAT files.

base.dir = "/Users/adinasa/Documents/"
adat_file <- "example.adat"
my_adat <- read_adat(paste0(base.dir,adat_file))  

Update the ADAT file with sample group information (adding sample group details from external file). Save the updated ADAT file (optional).

meta_file = paste(base.dir, "MAPPING.csv", sep="/")  
meta <- read.csv(meta_file, header = T, stringsAsFactors = FALSE)  
meta$SampleId <- as.character(meta$SampleId)  
my_adat <- dplyr::left_join(my_adat,meta, by="SampleId", keep=FALSE)
write_adat(my_adat, file = paste(base.dir, "example_updated.adat", sep="/"))  

Utility functions.

regex for analytes

is_seq <- function(.x) grepl("^seq\\.[0-9]{4}", .x)  

center/scale vector (z-scores).

cs <- function(.x) {      out <- .x - mean(.x)  
  out / sd(out)       
}

Data were log2-transformed within each sample. Control and Disease are two groups (this may vary in your data).

cleanData <- my_adat %>% 
  filter(SampleType == "Sample") %>% drop_na(Group) %>% 
  log2() %>% 
  mutate(SampleGroup = as.numeric(factor(Group, levels=c("Control", " Disease"))) - 1) %>% 
  modify_if(is_seq(names(.)), cs)

Human proteins with Uniprot ID and QC=PASS selected.

t_tests <- getAnalyteInfo(cleanData2) %>% 
  filter(ColCheck == "PASS") %>% 
  filter(Organism == "Human") %>%
  filter(UniProt != "") %>%
  select(AptName, SeqId, Target = TargetFullName,Organism, EntrezGeneID, EntrezGeneSymbol, UniProt, ColCheck)

Performed a Student’s t-test.

t_tests <- t_tests %>% 
  mutate(
    formula = map(AptName, ~ as.formula(paste(.x, "~ SampleGroup"))), 
    t_test  = map(formula, ~ stats::t.test(.x, data = cleanData,var.equal = TRUE)),  
    t_stat  = map_dbl(t_test, "statistic"),            
    p.value = map_dbl(t_test, "p.value"),              
    fdr     = p.adjust(p.value, method = "BH")         
  ) %>% arrange(p.value)

The results were used to identify proteins significantly associated with disease using a Benjamini-Hochberg false discovery rate (FDR) threshold of 1%.

t_tests [t_tests$fdr <= 0.1,]

Further reading …
Tandem Mass Tag (TMT)-based quantitation of proteins
Label-Free Quantitation (LFQ) of proteins