Bioinformatic processing of microbiome sequencing data

2 minute read

Published:

QIIME 2 (2017.10) is a widely used and community developed 16s ribosomal RNA sequencing data analysis package.

Data produced by QIIME 2 exist as QIIME 2 artifacts. A QIIME 2 artifact typically has the .qza file extension when output data stored in a file. Visualizations are another type of data (.qzv file) generated by QIIME 2, which can be viewed using a web interface https://view.qiime2.org (at Firefox web browser) without requiring a QIIME installation. Since QIIME 2 works with artifacts instead of data files (e.g. FASTA files), we must create a QIIME 2 artifact by importing our fastq.gz data files.

Here are some bash scripts and qsub commands that submit QIIME jobs to the cluster. qsub is a job submission command to Sun Grid Engine (SGE) cluster.

To find the PHRED offset used for the positional quality scores, execute the following script (BitBucket).

sh fastq_manifest_phred.sh
#!/bin/sh
# fastq_manifest_phred.sh
# checks-fastq-quality-score-format
FASTQ_FILE=$HOME/path/to/fastq.gz
zcat ${FASTQ_FILE} \
	| awk 'NR%4==0 {printf $0}' \
	| tr -d '\n' \
	| hexdump -v -e'/1 "%u\n"' \
	| head -n100000 \
	| sort -u

To import FASTQ files as QIIME 2 artifact and plot sequence positional quality scores, download the qiime2.2017.10_step1.sh from BitBucket and execute as

qsub qiime2.2017.10_step1.sh

To denoise imported paired sequence data and filter chimeras, QIIME 2 uses DADA2, which is an open-source software package that denoises and removes sequencing errors from Illumina amplicon sequence data, download the qiime2.2017.10_step2.sh from BitBucket and execute as

qsub qiime2.2017.10_step2.sh

For feature filtering (i.e., removal of samples and features from a feature table), download the qiime2.2017.10_step3.sh from BitBucket and execute as

qsub qiime2.2017.10_step3.sh

For diversity (alpha and beta), taxonomic and comratative analyses, download the qiime2.2017.10_step4.sh from BitBucket and execute as

qsub qiime2.2017.10_step4.sh

PC1vsPC2plot

The above analyses also produce a key summary table or BIOM (Biological Observation Matrix) file containing feature (Operational Taxonomic Units, OTUs) abundance information across samples, along with various annotations and sample metadata. Alternatively, upload BIOM file to MicrobiomeAnalyst, a web-based tool for comprehensive statistical, visual and meta-analysis of microbiome data, for taxonomic profiling - to characterize community compositions based on methods developed in ecology such as alpha-diversity (within-sample diversity) or beta-diversity (between-sample diversity) and comparative analysis - to identify features that are significantly different among conditions under study.

After analyzing your data, it’s finally time to interpret your results!