Overview

The included functions are broken up into 6 general sections:

Importing
DADA2-based denoising
Unrooted/rooted tree generation
Core-metrics generation
Taxonomic labelling
Optional analyses

In addition, waterway will automatically terminate after certain steps if certain conditions are met:

truncF=() or truncR=() in the sourcefile
- Automatically stops after importing data
sampling_depth=0 in the sourcefile
- Automatically stops after DADA2 denoising

Import step

qiime tools import

qiime tools import \
    --type 'SampleData[PairedEndSequencesWithQuality]' \
    --input-path $filepath \
    --input-format CasavaOneEightSingleLanePerSampleDirFmt \
    --output-path ${qzaoutput}imported_seqs.qza

The normal qiime tools import is invoked when the -m option for waterway isn’t specified. The import assumes that the files are named according to the Cassava 1.8 naming conventions.

Qiime 2 Documentation

Alternatively, if the -m option is specified, the input path instead points to the manifest file path supplied in the sourcefile, and the function run instead changes to the one below:

qiime tools import \
    --type 'SampleData[PairedEndSequencesWithQuality]' \
    --input-path $manifest \
    --input-format PairedEndFastqManifestPhred33V2 \
    --output-path ${qzaoutput}imported_seqs.qza

The manifest file should be constructed according to the Qiime 2 documentation. Alternatively, you can use the make_manifest tool I made for easier construction, available here. Take note that this function assumes that the files use Phred33 quality score encoding.

Qiime 2 Documentation

qiime demux summarize

qiime demux summarize \
    --i-data ${qzaoutput}imported_seqs.qza \
    --o-visualization ${qzaoutput}imported_seqs.qzv

This function uses the imported_seqs.qza from the original import function to output a qzv file that shows base quality for both forward and backward reads. As a general rule, Phred quality scores above a 20-22 are considered usable, while anything below should not be taken in the next DADA2 denoising step. If the import step has been previously done and has outputted the imported_seqs.qza, waterway will start from this visualization step.

Qiime 2 Documentation

DADA2

For each combination of forward and backwards truncF and truncR in the sourcefile, waterway will run the following DADA2-based analyses.

qiime dada2 denoise-paired

qiime dada2 denoise-paired \
    --i-demultiplexed-seqs $demuxpairedendpath \
    --p-trim-left-f $trimF \
    --p-trim-left-r $trimR \
    --p-trunc-len-f $truncF \
    --p-trunc-len-r $truncR \
    --o-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --o-representative-sequences "${qzaoutput}${truncF}-${truncR}/rep-seqs.qza" \
    --o-denoising-stats "${qzaoutput}${truncF}-${truncR}/denoising-stats.qza"

The DADA2 denoising and pairing algorithm does both filtering and QC checking on the imported reads. The important steps are highlighted below:

Filtering
Denoising
Merging
Chimera Detection

Filtering and denoising both involve QC checking the reads and tossing any low-quality reads (low Phred quality scores or reads with Ns in the sequence). Merging involved matching the forward and reverse reads. It is important to note that the DADA2 algorithm requires at least a 20 base pair overlap between the forward and reverse reads, else it will not pair the reads and will toss them out. Lastly, chimera detection involves the detection and filtering of chimeric reads.

If no reads are outputted by the function, a “NoOutput.txt” file will be deposited in the directory instead, and the directory will be ignored by all following functions. The most common cause of this output is insufficient truncF and truncR length (not enough overlap between reads), followed by low-quality reads (all reads are low Phred quality, above max-accepted-EE, or contain 1 or more Ns).

The function outputs three files: table.qza, rep-seqs.qza, and denoising-stats.qza

Qiime 2 Documentation

Summarization functions for DADA2

qiime feature-table summarize \
    --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/table.qzv"
    
qiime feature-table tabulate-seqs \
    --i-data "${qzaoutput}${truncF}-${truncR}/rep-seqs.qza" \
  --o-visualization "${qzaoutput}${truncF}-${truncR}/rep-seqs.qzv"

qiime metadata tabulate \
    --m-input-file "${qzaoutput}${truncF}-${truncR}/denoising-stats.qza" \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/denoising-stats.qzv"

These three functions are grouped together due to their outputs of turning the three qza files (created previously in the DADA2 step) into viewable qzv files. The most relevant to immediately review is denoising-stats.qzv, which will show the number of reads that passed through each step of the DADA2 filtering process.

These functions produce table.qzv, rep-seqs.qzv, and denoising-stats.qzv

feature-table summarize documentation

feature-table tabulate-seqs documentation

metadata tabulate documentation

Tree Generation

For each combination of forward and backwards truncF and truncR in the sourcefile that have a DADA2 output (table.qza), waterway will run the following function.

qiime phylogeny align-to-tree-mafft-fasttree

qiime phylogeny align-to-tree-mafft-fasttree \
    --i-sequences "${qzaoutput}${truncF}-${truncR}/rep-seqs.qza" \
    --o-alignment "${qzaoutput}${truncF}-${truncR}/aligned-rep-seqs.qza" \
    --o-masked-alignment "${qzaoutput}${truncF}-${truncR}/masked-aligned-rep-seqs.qza" \
    --o-tree "${qzaoutput}${truncF}-${truncR}/unrooted-tree.qza" \
    --o-rooted-tree "${qzaoutput}${truncF}-${truncR}/rooted-tree.qza"

The phylogeny function creates both the unrooted-tree.qza and rooted-tree.qza that is used later in the analyses quite extensively.

Qiime 2 Documentation

Core-metrics generation

For each combination of forward and backwards truncF and truncR in the sourcefile that have a DADA2 output (table.qza), waterway will run the following functions.

core-metrics-phylogenetic

qiime diversity core-metrics-phylogenetic \
            --i-phylogeny "${qzaoutput}${truncF}-${truncR}/rooted-tree.qza" \
            --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
            --p-sampling-depth $sampling_depth \
            --m-metadata-file $metadata_filepath \
            --output-dir "${qzaoutput}${truncF}-${truncR}/core-metrics-results"

Using the rooted-tree.qza created during tree generation and table.qza created from DADA2 filtering, this function outputs a directory containing many different measures of sample diversity and distance (see the documentation for the full output).

Qiime 2 Documentation

qiime diversity alpha-group-significance

qiime diversity alpha-group-significance \
    --i-alpha-diversity "${qzaoutput}${truncF}-${truncR}/core_metrics_results/faith_pd_vector.qza" \
    --m-metadata-file $metadata_filepath \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/core_metrics_results/faith_pd_group_significance.qzv"

Uses the faith_pd_vector.qza created from core-metrics-phylogenetic to allow comparisons of alpha diversity values.

Qiime 2 Documentation

qiime diversity alpha-rarefaction

qiime diversity alpha-rarefaction \
    --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --i-phylogeny "${qzaoutput}${truncF}-${truncR}/rooted-tree.qza" \
    --p-max-depth $sampling_depth \
    --m-metadata-file $metadata_filepath \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/alpha-rarefaction.qzv"

Generates alpha rarefaction curves that can be interacted with. Data is additionally grouped according to metadata, and can be explored according to group.

Qiime 2 Documentation

qiime diversity beta-group-significance

qiime diversity beta-group-significance \
    --i-distance-matrix "${qzaoutput2}core_metrics_results/unweighted_unifrac_distance_matrix.qza" \
    --m-metadata-file $metadata_filepath \
    --m-metadata-column $beta_diversity_group \
    --o-visualization "${qzaoutput2}core_metrics_results/unweighted-unifrac-beta-significance.qzv" \
    --p-pairwise
            
qiime diversity beta-group-significance \
    --i-distance-matrix "${qzaoutput2}core_metrics_results/weighted_unifrac_distance_matrix.qza" \
    --m-metadata-file $metadata_filepath \
    --m-metadata-column $beta_diversity_group \
    --o-visualization "${qzaoutput2}core_metrics_results/weighted-unifrac-beta-significance.qzv" \
    --p-pairwise

Determines whether groups of samples are significantly different from one another, spanning the previously produced unweighted and weighted unifrac distance matrices.

Qiime 2 Documentation

Taxonomic labelling

For each combination of forward and backwards truncF and truncR in the sourcefile that have a DADA2 output (table.qza), waterway will run the following functions.

IMPORTANT NOTE: sk-learn requires a classifier trained on the version of of qiime being ran. Classifiers for two versions of qiime2 are available on the useful links page, both trained on 99% coverage Greengenes database files. Otherwise, you can use the -c option with waterway to generate a classifier file for your version of Qiime 2, described here.

qiime feature-classifier classify-sklearn

qiime feature-classifier classify-sklearn \
    --i-classifier $classifierpath \
    --i-reads "${qzaoutput}${truncF}-${truncR}/rep-seqs.qza" \
    --o-classification "${qzaoutput}${truncF}-${truncR}/taxonomy.qza"

Using the classifier given, this function creates taxonomy.qza, containing reads grouped into taxons.

Qiime 2 Documentation

qiime metadata tabulate

qiime metadata tabulate \
    --m-input-file "${qzaoutput}${truncF}-${truncR}/taxonomy.qza" \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/taxonomy.qzv"

This turns the taxonomy.qza created in sk-learn into a Qiime 2 View readable qzv file.

Qiime 2 Documentation

qiime taxa barplot

qiime taxa barplot \
    --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --i-taxonomy "${qzaoutput}${truncF}-${truncR}/taxonomy.qza" \
    --m-metadata-file $metadata_filepath \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/taxa-bar-plots.qzv"

This outputs taxa-bar-plots.qzv, a file containing viewable bar graphs showing percentages of prominent taxa per sample. The samples can be grouped by metadata as well.

Qiime 2 Documentation

Optional Analyses

Whether optional analyses will run is dictated by variables set in optional_analyses.txt. If any are set to true, they will run for each combination of truncF and truncR. Optional analyses must be run only after taxonomic analysis has been completed for the dataset.

All optional analyses will output their files into their (automatically created) own folder, located in the truncF-truncR folder containing all other analyses.

Both Ancom and PCoA analyses follow the procedure outlined here, outlined by Rachael Lappan.

Ancom Analyses

qiime feature-table filter-features \
    --i-table "${qzaoutput2}table.qza" \
    --p-min-samples 2 \
    --o-filtered-table "${qzaoutput2}ancom_outputs/temp.qza"
            
qiime feature-table filter-features \
    --i-table "${qzaoutput2}ancom_outputs/temp.qza" \
    --p-min-frequency 10 \
    --o-filtered-table "${qzaoutput2}ancom_outputs/filtered_table.qza"

qiime taxa collapse \
    --i-table "${qzaoutput}${truncF}-${truncR}/filtered_table.qza" \
    --i-taxonomy "${qzaoutput}${truncF}-${truncR}/taxonomy.qza" \
    --p-level $collapse_taxa_to_level \
    --o-collapsed-table "${qzaoutput}${truncF}-${truncR}/ancom_outputs/genus.qza"
            
qiime composition add-pseudocount \
    --i-table "${qzaoutput}${truncF}-${truncR}/ancom_outputs/genus.qza" \
    --o-composition-table "${qzaoutput}${truncF}-${truncR}/ancom_outputs/added_pseudo.qza"
        
qiime composition ancom \
    --i-table "${qzaoutput}${truncF}-${truncR}/ancom_outputs/added_pseudo.qza" \
    --m-metadata-file $metadata_filepath \
    --m-metadata-column $group_to_compare \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/ancom_outputs/ancom_group.qzv"

The first two functions filter the original feature table to exclude OTUs that appear in only one sample, or have less than 10 reads across all samples (as detailed here).

The taxa collapse function collapses all taxa information down to the level specified in the collapse_taxa_to_level, then outputs a featuretable called genus.qza. Pseudocounts are added to take out any 0s in the data, as 0 can’t be log transformed. Finally, the actual ANCOM analysis is run.

PCoA Biplot

qiime feature-table rarefy \
    --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --p-sampling-depth $sampling_depth \
    --o-rarefied-table "${qzaoutput}${truncF}-${truncR}/biplot_outputs/rarefied_table.qza"

qiime diversity beta 
    --i-table "${qzaoutput}${truncF}-${truncR}/biplot_outputs/rarefied_table.qza"
    --p-metric braycurtis 
    --o-distance-matrix "${qzaoutput}${truncF}-${truncR}/biplot_outputs/braycurtis_div.qza"
    
qiime diversity pcoa \
    --i-distance-matrix "${qzaoutput}${truncF}-${truncR}/biplot_outputs/braycurtis_div.qza" \
    --p-number-of-dimensions $number_of_dimensions \
    --o-pcoa "${qzaoutput}${truncF}-${truncR}/biplot_outputs/braycurtis_pcoa.qza"

qiime feature-table relative-frequency \
    --i-table "${qzaoutput}${truncF}-${truncR}/alpha-rarefaction.qzv" \
    --o-relative-frequency-table "${qzaoutput}${truncF}-${truncR}/biplot_outputs/rarefied_table_relative.qza"

qiime diversity pcoa-biplot \
    --i-pcoa "${qzaoutput}${truncF}-${truncR}/biplot_outputs/braycurtis_pcoa.qza" \
    --i-features "${qzaoutput}${truncF}-${truncR}/biplot_outputs/rarefied_table_relative.qza" \
    --o-biplot "${qzaoutput}${truncF}-${truncR}/biplot_outputs/biplot_matrix_unweighted_unifrac.qza"

qiime emperor biplot \
    --i-biplot "${qzaoutput}${truncF}-${truncR}/biplot_outputs/biplot_matrix_unweighted_unifrac.qza" \
    --m-sample-metadata-file $metadata_filepath \
    --m-feature-metadata-file "${qzaoutput}${truncF}-${truncR}/taxonomy.qza" \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/biplot_outputs/unweighted_unifrac_emperor_biplot.qzv"

The PCoA biplot produces an emperor plot showing the taxa that are most strongly contributing to the variance on a PCoA plot. The first three functions create a feature table with the relative frequency of each OTU, while the fourth actually produces the biplot in .qza form. The last function is used to create a visualization.

Picrust2 Analysis

The q2-picrust2 plugin has to be manually installed into the Qiime2 environment.

qiime picrust2 full-pipeline \
    --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --i-seq "${qzaoutput}${truncF}-${truncR}/rep-seqs.qza" \
    --output-dir "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output" \
    --p-hsp-method $hsp_method \
    --p-max-nsti $max_nsti \
    --verbose

qiime feature-table summarize \
    --i-table "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output/pathway_abundance.qza" \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output/pathway_abundance.qzv"

qiime feature-table summarize \
    --i-table "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output/ko_metagenome.qza" \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output/ko_metagenome.qzv"

qiime feature-table summarize \
  --i-table "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output/ec_metagenome.qza" \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/}q2-picrust2_output/ec_metagenome.qzv"

qiime diversity core-metrics \
    --i-table "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output/pathway_abundance.qza" \
    --p-sampling-depth $sampling_depth \
  --m-metadata-file $metadata_filepath \
  --output-dir "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output/pathabun_core_metrics"

The Picrust2 pipeline is a popular method to analyze bacterial pathway abudances in 16S microbiome data. For more information on these outputs, look at the tutorial here.

DEICODE Analysis

The DEICODE plugin has to be manually installed into the Qiime2 environment.

qiime deicode rpca \
    --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --p-min-feature-count $min_feature_count \
    --p-min-sample-count $min_sample_count \
    --o-biplot "${qzaoutput}${truncF}-${truncR}/deicode_outputs/ordination.qza" \
    --o-distance-matrix "${qzaoutput}${truncF}-${truncR}/deicode_outputs/distance.qza"

qiime emperor biplot \
    --i-biplot "${qzaoutput}${truncF}-${truncR}/deicode_outputs/ordination.qza" \
    --m-sample-metadata-file $metadata_filepath \
    --m-feature-metadata-file "${qzaoutput}${truncF}-${truncR}/taxonomy.qza" \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/deicode_outputs/biplot.qzv" \
    --p-number-of-features $num_of_features

qiime diversity beta-group-significance \
    --i-distance-matrix "${qzaoutput}${truncF}-${truncR}/deicode_outputs/distance.qza" \
    --m-metadata-file $metadata_filepath \
    --m-metadata-column $group \
    --p-method permanova \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/deicode_outputs/PERMANOVAs/${group}-permanova.qzv"

DEICODE is a PCA analysis of sample clustering with regards to different taxa in the samples, and is very robust in regards to high levels of sparcity. The final PERMANOVA is a statistical measure of whether the metadata column (group) chosen explains the sample clustering in the biplot produced.

Gneiss Analysis

NOTE: As of Qiime2 version 2020.2, the Gneiss functions ols_regression, balance_taxonomy, and lme_regression have been removed in favor of songbird/aldex2, which has not been incorporated into Qiime2. I’ve left this option in waterway in case these functions are reincorporated in future versions of gneiss in Qiime2, or if users want to run gneiss analysis using an older version of Qiime2.

qiime gneiss gradient-clustering \
    --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --m-gradient-file $metadata_filepath \
    --m-gradient-column $gradient_column \
    --o-clustering "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/gradient-hierarchy.qza"

qiime gneiss ilr-hierarchical \
    --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --i-tree "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/gradient-hierarchy.qza" \
    --o-balances "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/balances.qza"

qiime gneiss ols-regression \
    --p-formula $gradient_column \
    --i-table "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/balances.qza" \
    --i-tree "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/gradient-hierarchy.qza" \
    --m-metadata-file $metadata_filepath \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/regression_summary_pCG.qzv"

qiime gneiss dendrogram-heatmap \
    --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --i-tree "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/gradient-hierarchy.qza" \
    --m-metadata-file $metadata_filepath \
    --m-metadata-column $gradient_column_categorical \
    --p-color-map 'seismic' \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/heatmap_pCG.qzv"

qiime gneiss balance-taxonomy \
    --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --i-tree "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/gradient-hierarchy.qza" \
    --i-taxonomy "${qzaoutput}${truncF}-${truncR}/taxonomy.qza" \
    --p-taxa-level $taxa_level \
    --p-balance-name $balance_name \
    --m-metadata-file $metadata_filepath \
    --m-metadata-column $gradient_column_categorical \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/${balance_name}_taxa_summary_${gradient_column_categorical}_level_${taxa_level}.qzv"

Functions used in waterway

Overview

Import step

qiime tools import

qiime demux summarize

DADA2

qiime dada2 denoise-paired

Summarization functions for DADA2

Tree Generation

qiime phylogeny align-to-tree-mafft-fasttree

Core-metrics generation

core-metrics-phylogenetic

qiime diversity alpha-group-significance

qiime diversity alpha-rarefaction

qiime diversity beta-group-significance

Taxonomic labelling

qiime feature-classifier classify-sklearn

qiime metadata tabulate

qiime taxa barplot

Optional Analyses

Ancom Analyses

PCoA Biplot

Picrust2 Analysis

DEICODE Analysis

Gneiss Analysis