The included functions are broken up into 6 general sections:
In addition, waterway
will automatically terminate after certain steps if certain conditions are met:
truncF=()
or truncR=()
in the sourcefile
sampling_depth=0
in the sourcefile
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path $filepath \
--input-format CasavaOneEightSingleLanePerSampleDirFmt \
--output-path ${qzaoutput}imported_seqs.qza
The normal qiime tools import
is invoked when the -m option for waterway isn’t specified. The import assumes that the files are named according to the Cassava 1.8 naming conventions.
Alternatively, if the -m option is specified, the input path instead points to the manifest file path supplied in the sourcefile, and the function run instead changes to the one below:
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path $manifest \
--input-format PairedEndFastqManifestPhred33V2 \
--output-path ${qzaoutput}imported_seqs.qza
The manifest file should be constructed according to the Qiime 2 documentation. Alternatively, you can use the make_manifest
tool I made for easier construction, available here. Take note that this function assumes that the files use Phred33 quality score encoding.
qiime demux summarize \
--i-data ${qzaoutput}imported_seqs.qza \
--o-visualization ${qzaoutput}imported_seqs.qzv
This function uses the imported_seqs.qza from the original import function to output a qzv file that shows base quality for both forward and backward reads. As a general rule, Phred quality scores above a 20-22 are considered usable, while anything below should not be taken in the next DADA2 denoising step. If the import step has been previously done and has outputted the imported_seqs.qza, waterway
will start from this visualization step.
For each combination of forward and backwards truncF
and truncR
in the sourcefile, waterway
will run the following DADA2-based analyses.
qiime dada2 denoise-paired \
--i-demultiplexed-seqs $demuxpairedendpath \
--p-trim-left-f $trimF \
--p-trim-left-r $trimR \
--p-trunc-len-f $truncF \
--p-trunc-len-r $truncR \
--o-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
--o-representative-sequences "${qzaoutput}${truncF}-${truncR}/rep-seqs.qza" \
--o-denoising-stats "${qzaoutput}${truncF}-${truncR}/denoising-stats.qza"
The DADA2 denoising and pairing algorithm does both filtering and QC checking on the imported reads. The important steps are highlighted below:
Filtering and denoising both involve QC checking the reads and tossing any low-quality reads (low Phred quality scores or reads with Ns in the sequence). Merging involved matching the forward and reverse reads. It is important to note that the DADA2 algorithm requires at least a 20 base pair overlap between the forward and reverse reads, else it will not pair the reads and will toss them out. Lastly, chimera detection involves the detection and filtering of chimeric reads.
If no reads are outputted by the function, a “NoOutput.txt” file will be deposited in the directory instead, and the directory will be ignored by all following functions. The most common cause of this output is insufficient truncF
and truncR
length (not enough overlap between reads), followed by low-quality reads (all reads are low Phred quality, above max-accepted-EE, or contain 1 or more Ns).
The function outputs three files: table.qza, rep-seqs.qza, and denoising-stats.qza
qiime feature-table summarize \
--i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
--o-visualization "${qzaoutput}${truncF}-${truncR}/table.qzv"
qiime feature-table tabulate-seqs \
--i-data "${qzaoutput}${truncF}-${truncR}/rep-seqs.qza" \
--o-visualization "${qzaoutput}${truncF}-${truncR}/rep-seqs.qzv"
qiime metadata tabulate \
--m-input-file "${qzaoutput}${truncF}-${truncR}/denoising-stats.qza" \
--o-visualization "${qzaoutput}${truncF}-${truncR}/denoising-stats.qzv"
These three functions are grouped together due to their outputs of turning the three qza files (created previously in the DADA2 step) into viewable qzv files. The most relevant to immediately review is denoising-stats.qzv, which will show the number of reads that passed through each step of the DADA2 filtering process.
These functions produce table.qzv, rep-seqs.qzv, and denoising-stats.qzv
feature-table summarize documentation
feature-table tabulate-seqs documentation
metadata tabulate documentation
For each combination of forward and backwards truncF
and truncR
in the sourcefile that have a DADA2 output (table.qza), waterway
will run the following function.
qiime phylogeny align-to-tree-mafft-fasttree \
--i-sequences "${qzaoutput}${truncF}-${truncR}/rep-seqs.qza" \
--o-alignment "${qzaoutput}${truncF}-${truncR}/aligned-rep-seqs.qza" \
--o-masked-alignment "${qzaoutput}${truncF}-${truncR}/masked-aligned-rep-seqs.qza" \
--o-tree "${qzaoutput}${truncF}-${truncR}/unrooted-tree.qza" \
--o-rooted-tree "${qzaoutput}${truncF}-${truncR}/rooted-tree.qza"
The phylogeny function creates both the unrooted-tree.qza and rooted-tree.qza that is used later in the analyses quite extensively.
For each combination of forward and backwards truncF
and truncR
in the sourcefile that have a DADA2 output (table.qza), waterway
will run the following functions.
qiime diversity core-metrics-phylogenetic \
--i-phylogeny "${qzaoutput}${truncF}-${truncR}/rooted-tree.qza" \
--i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
--p-sampling-depth $sampling_depth \
--m-metadata-file $metadata_filepath \
--output-dir "${qzaoutput}${truncF}-${truncR}/core-metrics-results"
Using the rooted-tree.qza created during tree generation and table.qza created from DADA2 filtering, this function outputs a directory containing many different measures of sample diversity and distance (see the documentation for the full output).
qiime diversity alpha-group-significance \
--i-alpha-diversity "${qzaoutput}${truncF}-${truncR}/core_metrics_results/faith_pd_vector.qza" \
--m-metadata-file $metadata_filepath \
--o-visualization "${qzaoutput}${truncF}-${truncR}/core_metrics_results/faith_pd_group_significance.qzv"
Uses the faith_pd_vector.qza created from core-metrics-phylogenetic to allow comparisons of alpha diversity values.
qiime diversity alpha-rarefaction \
--i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
--i-phylogeny "${qzaoutput}${truncF}-${truncR}/rooted-tree.qza" \
--p-max-depth $sampling_depth \
--m-metadata-file $metadata_filepath \
--o-visualization "${qzaoutput}${truncF}-${truncR}/alpha-rarefaction.qzv"
Generates alpha rarefaction curves that can be interacted with. Data is additionally grouped according to metadata, and can be explored according to group.
qiime diversity beta-group-significance \
--i-distance-matrix "${qzaoutput2}core_metrics_results/unweighted_unifrac_distance_matrix.qza" \
--m-metadata-file $metadata_filepath \
--m-metadata-column $beta_diversity_group \
--o-visualization "${qzaoutput2}core_metrics_results/unweighted-unifrac-beta-significance.qzv" \
--p-pairwise
qiime diversity beta-group-significance \
--i-distance-matrix "${qzaoutput2}core_metrics_results/weighted_unifrac_distance_matrix.qza" \
--m-metadata-file $metadata_filepath \
--m-metadata-column $beta_diversity_group \
--o-visualization "${qzaoutput2}core_metrics_results/weighted-unifrac-beta-significance.qzv" \
--p-pairwise
Determines whether groups of samples are significantly different from one another, spanning the previously produced unweighted and weighted unifrac distance matrices.
For each combination of forward and backwards truncF
and truncR
in the sourcefile that have a DADA2 output (table.qza), waterway
will run the following functions.
IMPORTANT NOTE: sk-learn requires a classifier trained on the version of of qiime being ran. Classifiers for two versions of qiime2 are available on the useful links page, both trained on 99% coverage Greengenes database files. Otherwise, you can use the -c
option with waterway to generate a classifier file for your version of Qiime 2, described here.
qiime feature-classifier classify-sklearn \
--i-classifier $classifierpath \
--i-reads "${qzaoutput}${truncF}-${truncR}/rep-seqs.qza" \
--o-classification "${qzaoutput}${truncF}-${truncR}/taxonomy.qza"
Using the classifier given, this function creates taxonomy.qza, containing reads grouped into taxons.
qiime metadata tabulate \
--m-input-file "${qzaoutput}${truncF}-${truncR}/taxonomy.qza" \
--o-visualization "${qzaoutput}${truncF}-${truncR}/taxonomy.qzv"
This turns the taxonomy.qza created in sk-learn
into a Qiime 2 View readable qzv file.
qiime taxa barplot \
--i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
--i-taxonomy "${qzaoutput}${truncF}-${truncR}/taxonomy.qza" \
--m-metadata-file $metadata_filepath \
--o-visualization "${qzaoutput}${truncF}-${truncR}/taxa-bar-plots.qzv"
This outputs taxa-bar-plots.qzv, a file containing viewable bar graphs showing percentages of prominent taxa per sample. The samples can be grouped by metadata as well.
Whether optional analyses will run is dictated by variables set in optional_analyses.txt. If any are set to true, they will run for each combination of truncF
and truncR
. Optional analyses must be run only after taxonomic analysis has been completed for the dataset.
All optional analyses will output their files into their (automatically created) own folder, located in the truncF-truncR folder containing all other analyses.
Both Ancom and PCoA analyses follow the procedure outlined here, outlined by Rachael Lappan.
qiime feature-table filter-features \
--i-table "${qzaoutput2}table.qza" \
--p-min-samples 2 \
--o-filtered-table "${qzaoutput2}ancom_outputs/temp.qza"
qiime feature-table filter-features \
--i-table "${qzaoutput2}ancom_outputs/temp.qza" \
--p-min-frequency 10 \
--o-filtered-table "${qzaoutput2}ancom_outputs/filtered_table.qza"
qiime taxa collapse \
--i-table "${qzaoutput}${truncF}-${truncR}/filtered_table.qza" \
--i-taxonomy "${qzaoutput}${truncF}-${truncR}/taxonomy.qza" \
--p-level $collapse_taxa_to_level \
--o-collapsed-table "${qzaoutput}${truncF}-${truncR}/ancom_outputs/genus.qza"
qiime composition add-pseudocount \
--i-table "${qzaoutput}${truncF}-${truncR}/ancom_outputs/genus.qza" \
--o-composition-table "${qzaoutput}${truncF}-${truncR}/ancom_outputs/added_pseudo.qza"
qiime composition ancom \
--i-table "${qzaoutput}${truncF}-${truncR}/ancom_outputs/added_pseudo.qza" \
--m-metadata-file $metadata_filepath \
--m-metadata-column $group_to_compare \
--o-visualization "${qzaoutput}${truncF}-${truncR}/ancom_outputs/ancom_group.qzv"
The first two functions filter the original feature table to exclude OTUs that appear in only one sample, or have less than 10 reads across all samples (as detailed here).
The taxa collapse function collapses all taxa information down to the level specified in the collapse_taxa_to_level
, then outputs a featuretable called genus.qza. Pseudocounts are added to take out any 0s in the data, as 0 can’t be log transformed. Finally, the actual ANCOM analysis is run.
qiime feature-table rarefy \
--i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
--p-sampling-depth $sampling_depth \
--o-rarefied-table "${qzaoutput}${truncF}-${truncR}/biplot_outputs/rarefied_table.qza"
qiime diversity beta
--i-table "${qzaoutput}${truncF}-${truncR}/biplot_outputs/rarefied_table.qza"
--p-metric braycurtis
--o-distance-matrix "${qzaoutput}${truncF}-${truncR}/biplot_outputs/braycurtis_div.qza"
qiime diversity pcoa \
--i-distance-matrix "${qzaoutput}${truncF}-${truncR}/biplot_outputs/braycurtis_div.qza" \
--p-number-of-dimensions $number_of_dimensions \
--o-pcoa "${qzaoutput}${truncF}-${truncR}/biplot_outputs/braycurtis_pcoa.qza"
qiime feature-table relative-frequency \
--i-table "${qzaoutput}${truncF}-${truncR}/alpha-rarefaction.qzv" \
--o-relative-frequency-table "${qzaoutput}${truncF}-${truncR}/biplot_outputs/rarefied_table_relative.qza"
qiime diversity pcoa-biplot \
--i-pcoa "${qzaoutput}${truncF}-${truncR}/biplot_outputs/braycurtis_pcoa.qza" \
--i-features "${qzaoutput}${truncF}-${truncR}/biplot_outputs/rarefied_table_relative.qza" \
--o-biplot "${qzaoutput}${truncF}-${truncR}/biplot_outputs/biplot_matrix_unweighted_unifrac.qza"
qiime emperor biplot \
--i-biplot "${qzaoutput}${truncF}-${truncR}/biplot_outputs/biplot_matrix_unweighted_unifrac.qza" \
--m-sample-metadata-file $metadata_filepath \
--m-feature-metadata-file "${qzaoutput}${truncF}-${truncR}/taxonomy.qza" \
--o-visualization "${qzaoutput}${truncF}-${truncR}/biplot_outputs/unweighted_unifrac_emperor_biplot.qzv"
The PCoA biplot produces an emperor plot showing the taxa that are most strongly contributing to the variance on a PCoA plot. The first three functions create a feature table with the relative frequency of each OTU, while the fourth actually produces the biplot in .qza form. The last function is used to create a visualization.
The q2-picrust2 plugin has to be manually installed into the Qiime2 environment.
qiime picrust2 full-pipeline \
--i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
--i-seq "${qzaoutput}${truncF}-${truncR}/rep-seqs.qza" \
--output-dir "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output" \
--p-hsp-method $hsp_method \
--p-max-nsti $max_nsti \
--verbose
qiime feature-table summarize \
--i-table "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output/pathway_abundance.qza" \
--o-visualization "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output/pathway_abundance.qzv"
qiime feature-table summarize \
--i-table "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output/ko_metagenome.qza" \
--o-visualization "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output/ko_metagenome.qzv"
qiime feature-table summarize \
--i-table "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output/ec_metagenome.qza" \
--o-visualization "${qzaoutput}${truncF}-${truncR}/}q2-picrust2_output/ec_metagenome.qzv"
qiime diversity core-metrics \
--i-table "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output/pathway_abundance.qza" \
--p-sampling-depth $sampling_depth \
--m-metadata-file $metadata_filepath \
--output-dir "${qzaoutput}${truncF}-${truncR}/q2-picrust2_output/pathabun_core_metrics"
The Picrust2 pipeline is a popular method to analyze bacterial pathway abudances in 16S microbiome data. For more information on these outputs, look at the tutorial here.
The DEICODE plugin has to be manually installed into the Qiime2 environment.
qiime deicode rpca \
--i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
--p-min-feature-count $min_feature_count \
--p-min-sample-count $min_sample_count \
--o-biplot "${qzaoutput}${truncF}-${truncR}/deicode_outputs/ordination.qza" \
--o-distance-matrix "${qzaoutput}${truncF}-${truncR}/deicode_outputs/distance.qza"
qiime emperor biplot \
--i-biplot "${qzaoutput}${truncF}-${truncR}/deicode_outputs/ordination.qza" \
--m-sample-metadata-file $metadata_filepath \
--m-feature-metadata-file "${qzaoutput}${truncF}-${truncR}/taxonomy.qza" \
--o-visualization "${qzaoutput}${truncF}-${truncR}/deicode_outputs/biplot.qzv" \
--p-number-of-features $num_of_features
qiime diversity beta-group-significance \
--i-distance-matrix "${qzaoutput}${truncF}-${truncR}/deicode_outputs/distance.qza" \
--m-metadata-file $metadata_filepath \
--m-metadata-column $group \
--p-method permanova \
--o-visualization "${qzaoutput}${truncF}-${truncR}/deicode_outputs/PERMANOVAs/${group}-permanova.qzv"
DEICODE is a PCA analysis of sample clustering with regards to different taxa in the samples, and is very robust in regards to high levels of sparcity. The final PERMANOVA is a statistical measure of whether the metadata column (group) chosen explains the sample clustering in the biplot produced.
NOTE: As of Qiime2 version 2020.2, the Gneiss functions ols_regression
, balance_taxonomy
, and lme_regression
have been removed in favor of songbird/aldex2, which has not been incorporated into Qiime2. I’ve left this option in waterway
in case these functions are reincorporated in future versions of gneiss in Qiime2, or if users want to run gneiss analysis using an older version of Qiime2.
qiime gneiss gradient-clustering \
--i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
--m-gradient-file $metadata_filepath \
--m-gradient-column $gradient_column \
--o-clustering "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/gradient-hierarchy.qza"
qiime gneiss ilr-hierarchical \
--i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
--i-tree "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/gradient-hierarchy.qza" \
--o-balances "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/balances.qza"
qiime gneiss ols-regression \
--p-formula $gradient_column \
--i-table "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/balances.qza" \
--i-tree "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/gradient-hierarchy.qza" \
--m-metadata-file $metadata_filepath \
--o-visualization "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/regression_summary_pCG.qzv"
qiime gneiss dendrogram-heatmap \
--i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
--i-tree "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/gradient-hierarchy.qza" \
--m-metadata-file $metadata_filepath \
--m-metadata-column $gradient_column_categorical \
--p-color-map 'seismic' \
--o-visualization "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/heatmap_pCG.qzv"
qiime gneiss balance-taxonomy \
--i-table "${qzaoutput}${truncF}-${truncR}/table.qza" \
--i-tree "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/gradient-hierarchy.qza" \
--i-taxonomy "${qzaoutput}${truncF}-${truncR}/taxonomy.qza" \
--p-taxa-level $taxa_level \
--p-balance-name $balance_name \
--m-metadata-file $metadata_filepath \
--m-metadata-column $gradient_column_categorical \
--o-visualization "${qzaoutput}${truncF}-${truncR}/gneiss_outputs/${balance_name}_taxa_summary_${gradient_column_categorical}_level_${taxa_level}.qzv"