Taxonomic and diversity profiling of the microbiome - 16S rRNA gene amplicon sequence data

1 minute read


The 16S ribosomal RNA (rRNA) gene of Bacteria codes for the RNA component of the 30S subunit. Different bacterial species have one to multiple copies of the 16S rRNA gene, and each with 9 hypervariable regions, V1-V9. High-throughput sequencing of 16S rRNA gene (a “marker gene”) amplicons has become a widely used method to study bacterial phylogeny and species classification.

Quantitative Insights Into Microbial Ecology “QIIME” 2 (release 2018.11)1 is a widely used package to identity abundance of microbes using 16s rRNA. Briefly, feature table containing counts of each unique sequence in the samples will be constructed using qiime dada2 denoise-paired method. A feature is essentially any unit of observation, e.g., an OTU (Operational Taxonomic Unit), a sequence variant, a gene or a metabolite. In QIIME2 (currently), most features will be OTUs or sequence variants (alternatively, for OTUs, use QIIME2 plugin q2-vsearch).

Data produced by QIIME 2 exist as QIIME 2 artifacts. A QIIME 2 artifact typically has the .qza file extension when output data stored in a file. Visualizations are another type of data (.qzv file extension) generated by QIIME 2, which can be viewed using a web interface (at Firefox web browser) without requiring a QIIME installation. Since QIIME 2 works with artifacts instead of data files (e.g. FASTA files), we must create a QIIME 2 artifact by importing our fastq.gz data files.

Scripts are available for AWS Cloud and HPC Cluster.