Genomic variants from RNA-Seq data

1 minute read

Published: January 05, 2018

RNA-Seq allows the detection and quantification of known and rare RNA transcripts within a sample. In addition to differential expression and detection of novel transcripts, RNA-seq also supports the detection of genomic variation in expressed regions.

Currently few workflows exist for detecting SNPs in RNA-seq data, including eSNV-detect, SNPiR and Opossum. Here, I have employed GATK workflow for SNP and indel calling on RNAseq data, which is based on the following steps:

Reference (hg38) based read mapping using STAR aligner. This is a 2-pass approach with the suggested parameters. In this STAR 2-pass approach, splice junctions detected in a first alignment run are used to guide the final alignment (reads which have been mapped across splice junctions must be split to remove intronic parts).
Add read group information, sort, mark the duplicates and index with picard.jar
GATK’s SplitNCigarReads split the reads into exon segments (removing Ns but maintaining grouping information) and reassigning mapping qualities.
Indel realignment and recalibration of Base qualities and
Variant calling with GATK’s HaplotypeCaller, and finally filtering the variants with GATK’s VariantFiltration

My qsub-based pipeline is available at bitbucket.org

Share on

Twitter Facebook LinkedIn

Quantitative Proteomics: Aptamer-Based Protein Quantification

6 minute read

Published: February 20, 2023

Quantitative proteomics is a cutting-edge approach for measuring protein levels in complex biological samples. One innovative method in this field is aptamer-based protein quantification. Aptamers, which are short, single-stranded DNA or RNA molecules, are engineered to specifically bind to target proteins with high precision.

Kaplan-Meier Curve using R

3 minute read

Published: October 19, 2022

The Kaplan-Meier curve is a powerful tool in survival analysis, commonly used to estimate the probability of an event—such as survival—at different time intervals. It provides a visual representation of the time it takes for an event to occur across a patient population. This method is especially useful in medical studies where understanding survival rates is key.

Scientist, Bioinformatics

Genomic variants from RNA-Seq data

Share on

You May Also Enjoy

Quantitative Proteomics: Aptamer-Based Protein Quantification

Kaplan-Meier Curve using R

Annotation of genetic variants

Quantitative proteomics: TMT-based quantitation of proteins