RNA-Seq eQTL Analysis Pipeline: Uncovering Genetic Influences on Gene Expression

1 minute read

Published:

Understanding how genetic variations impact gene expression is crucial for uncovering the underlying mechanisms of complex traits and diseases. One powerful tool for this investigation is the expression quantitative trait locus (eQTL) analysis, which examines the relationship between genetic variants and genome-wide expression levels.

Our eQTL analysis pipeline is built on the hg38 reference genome and employs several key steps:

  1. Mapping Reads with STAR: Paired-end reads from RNA-Seq data are aligned using the STAR aligner. STAR is known for its speed and accuracy in mapping reads to the reference genome.
  2. Quantifying Gene Expression with HTSeq: Following alignment, we quantify gene expression by counting the number of reads mapping to exons using HTSeq. This step utilizes RefSeq gene annotations to ensure precise measurement of expression levels without the need for transcript assembly.
  3. Normalizing Data with edgeR: To account for systematic variability such as library fragment size, sequence composition bias, and read depth, we normalize the raw counts using the trimmed mean of M-values (TMM) approach through edgeR. This normalization helps in making accurate comparisons across samples.
  4. Genotyping Data: In addition to gene expression quantification, RNAseq data can be combined with genotyping data for eQTL analysis. We use Infinium CytoSNP-850K v1.2 arrays to detect genetic and structural variations.
  5. Analyzing Array Data: The array data is processed using GenomeStudioR or BlueFuse Multi software based on the hg38/GRCh38 reference genome. After importing the raw array data along with the SNP manifest file (.bpm) and standard cluster file (.egt) into GenomeStudio, we perform clustering of SNP intensities. Genotyping calls are made using the GenCall algorithm, which is informed by the GenTrain clustering algorithm.

Matrix eQTL was employed to efficiently test associations by modeling the effect of genotype using an additive linear approach. Here’s a streamlined description of the data analysis steps for RNA-Seq based eQTL mapping: