Bioinformatic processing of microbiome sequencing data

2 minute read


QIIME 2 (2017.10) is a widely used package to identity abundance of microbes using 16s ribosomal RNA (a “marker gene”). 16s rRNA has 9 hypervariable regions (V1-V9), which can be used to identify differen species.

Data produced by QIIME 2 exist as QIIME 2 artifacts. A QIIME 2 artifact typically has the .qza file extension when output data stored in a file. Visualizations are another type of data (.qzv file) generated by QIIME 2, which can be viewed using a web interface (at Firefox web browser) without requiring a QIIME installation. Since QIIME 2 works with artifacts instead of data files (e.g. FASTA files), we must create a QIIME 2 artifact by importing our fastq.gz data files.

Here are some bash scripts and qsub commands that submit QIIME jobs to the cluster. qsub is a job submission command to Sun Grid Engine (SGE) cluster.

To find the PHRED offset used for the positional quality scores, execute the following script (BitBucket).

# checks-fastq-quality-score-format
zcat ${FASTQ_FILE} \
	| awk 'NR%4==0 {printf $0}' \
	| tr -d '\n' \
	| hexdump -v -e'/1 "%u\n"' \
	| head -n100000 \
	| sort -u

To import FASTQ files as QIIME 2 artifact and plot sequence positional quality scores, download the from BitBucket and execute as


To denoise imported paired sequence data and filter chimeras, QIIME 2 uses DADA2, which is an open-source software package that denoises and removes sequencing errors from Illumina amplicon sequence data, download the from BitBucket and execute as


For feature filtering (i.e., removal of samples and features from a feature table), download the from BitBucket and execute as


For diversity (alpha and beta), taxonomic and comratative analyses, download the from BitBucket and execute as



The above analyses also produce a key summary table or BIOM (Biological Observation Matrix) file containing feature (Operational Taxonomic Units, OTUs) abundance information across samples, along with various annotations and sample metadata. Alternatively, upload BIOM file to MicrobiomeAnalyst, a web-based tool for comprehensive statistical, visual and meta-analysis of microbiome data, for taxonomic profiling - to characterize community compositions based on methods developed in ecology such as alpha-diversity (within-sample diversity) or beta-diversity (between-sample diversity) and comparative analysis - to identify features that are significantly different among conditions under study.

After analyzing your data, it’s finally time to interpret your results!