Overview

The included functions are broken up into 6 general sections:

  1. Importing
  2. DADA2-based denoising
  3. Unrooted/rooted tree generation
  4. Core-metrics generation
  5. Taxonomic labelling
  6. Optional analyses

In addition, waterway will automatically terminate after certain steps if certain conditions are met:



Import step

qiime tools import

The normal qiime tools import is invoked when the -m option for waterway isn’t specified. The import assumes that the files are named according to the Cassava 1.8 naming conventions.

Qiime 2 Documentation

Alternatively, if the -m option is specified, the input path instead points to the manifest file path supplied in the sourcefile, and the function run instead changes to the one below:

The manifest file should be constructed according to the Qiime 2 documentation. Alternatively, you can use the make_manifest tool I made for easier construction, available here. Take note that this function assumes that the files use Phred33 quality score encoding.

Qiime 2 Documentation


qiime demux summarize

This function uses the imported_seqs.qza from the original import function to output a qzv file that shows base quality for both forward and backward reads. As a general rule, Phred quality scores above a 20-22 are considered usable, while anything below should not be taken in the next DADA2 denoising step. If the import step has been previously done and has outputted the imported_seqs.qza, waterway will start from this visualization step.

Qiime 2 Documentation



DADA2

For each combination of forward and backwards truncF and truncR in the sourcefile, waterway will run the following DADA2-based analyses.


qiime dada2 denoise-paired

The DADA2 denoising and pairing algorithm does both filtering and QC checking on the imported reads. The important steps are highlighted below:

  1. Filtering
  2. Denoising
  3. Merging
  4. Chimera Detection

Filtering and denoising both involve QC checking the reads and tossing any low-quality reads (low Phred quality scores or reads with Ns in the sequence). Merging involved matching the forward and reverse reads. It is important to note that the DADA2 algorithm requires at least a 20 base pair overlap between the forward and reverse reads, else it will not pair the reads and will toss them out. Lastly, chimera detection involves the detection and filtering of chimeric reads.

If no reads are outputted by the function, a “NoOutput.txt” file will be deposited in the directory instead, and the directory will be ignored by all following functions. The most common cause of this output is insufficient truncF and truncR length (not enough overlap between reads), followed by low-quality reads (all reads are low Phred quality, above max-accepted-EE, or contain 1 or more Ns).

The function outputs three files: table.qza, rep-seqs.qza, and denoising-stats.qza

Qiime 2 Documentation


Summarization functions for DADA2

These three functions are grouped together due to their outputs of turning the three qza files (created previously in the DADA2 step) into viewable qzv files. The most relevant to immediately review is denoising-stats.qzv, which will show the number of reads that passed through each step of the DADA2 filtering process.

These functions produce table.qzv, rep-seqs.qzv, and denoising-stats.qzv

feature-table summarize documentation

feature-table tabulate-seqs documentation

metadata tabulate documentation



Tree Generation

For each combination of forward and backwards truncF and truncR in the sourcefile that have a DADA2 output (table.qza), waterway will run the following function.


Core-metrics generation

For each combination of forward and backwards truncF and truncR in the sourcefile that have a DADA2 output (table.qza), waterway will run the following functions.


core-metrics-phylogenetic

Using the rooted-tree.qza created during tree generation and table.qza created from DADA2 filtering, this function outputs a directory containing many different measures of sample diversity and distance (see the documentation for the full output).

Qiime 2 Documentation


Taxonomic labelling

For each combination of forward and backwards truncF and truncR in the sourcefile that have a DADA2 output (table.qza), waterway will run the following functions.

IMPORTANT NOTE: sk-learn requires a classifier trained on the version of of qiime being ran. Classifiers for two versions of qiime2 are available on the useful links page, both trained on 99% coverage Greengenes database files. Otherwise, you can use the -c option with waterway to generate a classifier file for your version of Qiime 2, described here.


qiime taxa barplot

This outputs taxa-bar-plots.qzv, a file containing viewable bar graphs showing percentages of prominent taxa per sample. The samples can be grouped by metadata as well.

Qiime 2 Documentation



Optional Analyses

Whether optional analyses will run is dictated by variables set in optional_analyses.txt. If any are set to true, they will run for each combination of truncF and truncR. Optional analyses must be run only after taxonomic analysis has been completed for the dataset.

All optional analyses will output their files into their (automatically created) own folder, located in the truncF-truncR folder containing all other analyses.

Both Ancom and PCoA analyses follow the procedure outlined here, outlined by Rachael Lappan.


Ancom Analyses

The first two functions filter the original feature table to exclude OTUs that appear in only one sample, or have less than 10 reads across all samples (as detailed here).

The taxa collapse function collapses all taxa information down to the level specified in the collapse_taxa_to_level, then outputs a featuretable called genus.qza. Pseudocounts are added to take out any 0s in the data, as 0 can’t be log transformed. Finally, the actual ANCOM analysis is run.


PCoA Biplot

qiime feature-table rarefy \
    --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --p-sampling-depth $sampling_depth \
    --o-rarefied-table "${qzaoutput}${truncF}-${truncR}/biplot_outputs/rarefied_table.qza"

qiime diversity beta 
    --i-table "${qzaoutput}${truncF}-${truncR}/biplot_outputs/rarefied_table.qza"
    --p-metric braycurtis 
    --o-distance-matrix "${qzaoutput}${truncF}-${truncR}/biplot_outputs/braycurtis_div.qza"
    
qiime diversity pcoa \
    --i-distance-matrix "${qzaoutput}${truncF}-${truncR}/biplot_outputs/braycurtis_div.qza" \
    --p-number-of-dimensions $number_of_dimensions \
    --o-pcoa "${qzaoutput}${truncF}-${truncR}/biplot_outputs/braycurtis_pcoa.qza"

qiime feature-table relative-frequency \
    --i-table "${qzaoutput}${truncF}-${truncR}/alpha-rarefaction.qzv" \
    --o-relative-frequency-table "${qzaoutput}${truncF}-${truncR}/biplot_outputs/rarefied_table_relative.qza"

qiime diversity pcoa-biplot \
    --i-pcoa "${qzaoutput}${truncF}-${truncR}/biplot_outputs/braycurtis_pcoa.qza" \
    --i-features "${qzaoutput}${truncF}-${truncR}/biplot_outputs/rarefied_table_relative.qza" \
    --o-biplot "${qzaoutput}${truncF}-${truncR}/biplot_outputs/biplot_matrix_unweighted_unifrac.qza"

qiime emperor biplot \
    --i-biplot "${qzaoutput}${truncF}-${truncR}/biplot_outputs/biplot_matrix_unweighted_unifrac.qza" \
    --m-sample-metadata-file $metadata_filepath \
    --m-feature-metadata-file "${qzaoutput}${truncF}-${truncR}/taxonomy.qza" \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/biplot_outputs/unweighted_unifrac_emperor_biplot.qzv"

The PCoA biplot produces an emperor plot showing the taxa that are most strongly contributing to the variance on a PCoA plot. The first three functions create a feature table with the relative frequency of each OTU, while the fourth actually produces the biplot in .qza form. The last function is used to create a visualization.


Gneiss Analysis

NOTE: As of Qiime2 version 2020.2, the Gneiss functions ols_regression, balance_taxonomy, and lme_regression have been removed in favor of songbird/aldex2, which has not been incorporated into Qiime2. I’ve left this option in waterway in case these functions are reincorporated in future versions of gneiss in Qiime2, or if users want to run gneiss analysis using an older version of Qiime2.

qiime gneiss gradient-clustering \
    --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --m-gradient-file $metadata_filepath \
    --m-gradient-column $gradient_column \
    --o-clustering "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/gradient-hierarchy.qza"

qiime gneiss ilr-hierarchical \
    --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --i-tree "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/gradient-hierarchy.qza" \
    --o-balances "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/balances.qza"

qiime gneiss ols-regression \
    --p-formula $gradient_column \
    --i-table "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/balances.qza" \
    --i-tree "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/gradient-hierarchy.qza" \
    --m-metadata-file $metadata_filepath \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/regression_summary_pCG.qzv"

qiime gneiss dendrogram-heatmap \
    --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --i-tree "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/gradient-hierarchy.qza" \
    --m-metadata-file $metadata_filepath \
    --m-metadata-column $gradient_column_categorical \
    --p-color-map 'seismic' \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/heatmap_pCG.qzv"

qiime gneiss balance-taxonomy \
    --i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
    --i-tree "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/gradient-hierarchy.qza" \
    --i-taxonomy "${qzaoutput}${truncF}-${truncR}/taxonomy.qza" \
    --p-taxa-level $taxa_level \
    --p-balance-name $balance_name \
    --m-metadata-file $metadata_filepath \
    --m-metadata-column $gradient_column_categorical \
    --o-visualization "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/${balance_name}_taxa_summary_${gradient_column_categorical}_level_${taxa_level}.qzv"