Quantitative proteomics: label-free quantitation of proteins

2 minute read

Published:

Liquid chromatography (LC) coupled with mass spectrometry (MS) has been widely used for protein expression quantification. Protein quantification by tandem-MS (MS/MS) uses integrated peak intensity from the parent-ion mass (MS1) or features from fragment-ions (MS2). MS1 methods use the iBAQ (intensity Based Absolute Quantification) algorithm (a protein’s total non-normalised intensities are divided by the number of measurable tryptic peptides). Untargeted label-free quantitation (LFQ) of proteins, aims to determine the relative amount of proteins in two or more biological samples.

Mass spectrometer generated raw files are used for label-free quantitation of proteins. Base peak chromatograms are inspected visually using RawMeat1, which is a data quality assessment tool designed for Thermo instruments. All raw files are processed together in a single run by MaxQuant (version 1.6.0.16)2 with default parameters. Database searches are performed using the Andromeda search engine (a peptide search engine based on probabilistic scoring) with the UniProt-SwissProt human canonical database as a reference and a contaminants database of common laboratory contaminants. MaxQuant reports summed intensity for each protein, as well as its iBAQ value. Proteins that share all identified peptides are combined into a single protein group. Peptides that match multiple protein groups (“razor” peptides) are assigned to the protein group with the most unique peptides. MaxQuant employs the MaxLFQ algorithm for label-free quantitation (LFQ). Quantification will be performed using razor and unique peptides, including those modified by acetylation (protein N-terminal), oxidation (Met) and deamidation (NQ). PTXQC3 is used for general quality control of proteomics data, which takes MaxQuant result files.

Data processing is performed using Perseus (version 1.5.0.31)4. In brief, protein group LFQ intensities are log2-transformed to reduce the effect of outliers. To overcome the obstacle of missing LFQ values, missing values are imputed before fit the models. Hierarchical clustering is performed on Z-score normalized, log2-transformed LFQ intensities. Log ratios are calculated as the difference in average log2 LFQ intensity values between experimental and control groups. Two-tailed, Student’s t test calculations are used in statistical tests. A protein is considered statistically significant if its fold change is ≥ 2 and FDR ≤ 0.01. All the identified differentially expressed proteins are used in protein network or pathway analysis. In addition to the above analytical considerations, good experimental design helps effectively identify true differences in the presence of variability from various sources and also avoids bias during data acquisition.

References

  1. RawMeat is a nice Thermo raw file diagnostic tool developed by the now defunct Vast Scientific 

  2. MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets 

  3. PTXQC, an R-based quality control pipeline called Proteomics Quality Control 

  4. Perseus is software package for shotgun proteomics data analyses