Version history

3.3.3 – (2021-07-14) Yep, you guessed it. More bugfixes.

3.3.2 – (2021-07-14) More bugfixes to the LEfSe table generation for subsets.

3.3.1 – (2021-07-13) Multiple bugfixes for the LEfSe table generation, and increased verbosity of the LEfSe table generation codeblock.

3.3.0 – (2021-07-13) LEfSe table generation has been added, which allows for automatic generation of tabulated data for use with LEfSe along multiple categorical variables. Tables from taxa levels 1 (kingdom) through 7 (specie) are generated for each variable supplied. To use the LEfSe table generator, an R installation with tidyverse installed is required.

3.2.1 – (2021-04-29) Picrust2 now exports ec_metagenome and ko_metagenome along with pathway_abundance. Various bugfixes, especially to the optional_analyses.txt configs.

3.1.1 – (2020-07-30) Bugfixes and increased verbosity to the LEFse csv generation code.

3.1.0 – (2020-07-29) Added automatic phyloseq R object creation for all datasets, including subsets if -s is specified. waterway also can now create .tsv files for use with LEFse, which can be set in optional_analyses.txt. The level-x.csv file must first be downloaded from taxa-bar-plots.qza. Multiple bugfixes to the new R scripts, including filepath updates and script corner case management.

3.0.0 – (2020-07-21) A major update! The main showpiece is the newly added ability to filter down a main dataset into sub-datasets called subsets. Subsets can be created by adding filtered metadata files to the folder created with the option -s. waterway will automatically filter the total dataset down to the samples listed in each filtered metadata file, and will then put all subset outputs into their own folders in the directory called subsets in the output folder. All analyses described in both config.txt and optional_analyses.txt will be run on the main dataset and all subsets when the -s option is supplied. When the -s option is not used, waterway will only run analyses on the main dataset.

Along with subsetting, various improvements have been made to the verbose messages for use in bug detection and fixes. Yellow messages indicating that analyses in optional_analyses.txt have not been run are now only shown when the --verbose option is supplied, reducing screen clutter.

2.11.0 – (2020-07-20) Speed increases to the ANCOM block, especially when being run multiple times. The make_collapsed_table option has been removed from the ANCOM block, since it was essentially never being toggled false and only served to confuse users. Messages echoed after an optional analysis was not run now only appear when the --verbose option is used with waterway. Further bugfixes to the ANCOM block.

2.10.0 – (2020-07-09) Quite a lot of bugfixes, including fixing a newly-broken ANCOM codeblock. Added the file add_to_bashrc.bash, that will automatically set the alias waterway as a shorthand to run waterway.bash if prompted. Also had a large code review, with some code chunks being simplified down (mostly getting rid of excess loops and echo statements).

2.9.0 – (2020-07-03) Mostly a long needed code cleanup and partition for me. Code has been majorly broken up into smaller chunks, instead of one huge script (which was impossible for me to maintain after a while). The smaller chunks of code are located in the src folder, and are called by the master file waterway.bash. Smaller chunks of code being edited at a time is easier for me to maintain in the long run and can be more efficient, as the same chunk of code can be called multiple times under different decision trees. All variables are passed from waterway.bash to the code chunk and vice versa. Also moved some of the other help files into their own folder.

2.8.0 – (2020-06-11) Added Songbird as a possible analysis. Added metadata filtering to continuous beta diversity analyses and Songbird analyses.

2.7.0 – (2020-06-02) Removed the rerun-alpha-and-phyloseq block in optional_analyses.txt, and replaced it with an extended_alpha block that allows for all alpha diversity measures to be calculated and visualized. Vectors and visualizations are stored under alpha_diversities in the output folder.

2.6.1 – (2020-05-29) Removed heatmap generation when sample-classifier is applied to continuous data (because you can’t do that, duh!). Many bugfixes to the metadata filtering script, as well as the filtering block as a whole. The filtering script now supports more modular filepath generation, allowing for more flexibility in use.

2.6.0 – (2020-05-27) Added heatmap generation to sample-classifier regress-samples-ncv. Both normal and NCV sample-classifiers using continuous data now first filter the metadata files (similar to beta diversity metadata filtering) as well as table.qza files. Multiple bugfixes to the sample-classifier and beta-diversity blocks.

2.5.0 – (2020-05-27) waterway now first filters the metadata file by any missing values when computing distance matrices for beta diversity between numeric (continuous) groups. The placeholder for missing values should denoted by missing_samples= in config.txt. In practice, this means that if data that don’t exist are labelled as “Absent” in the metadata file, the parameter in config.txt should be missing_samples='Absent'. The filtering is done via an R script, which adds two R package dependencies (Tidyverse and data.table), although data.table should be installed by default. Future coding around conda commands has been made slightly easier, as waterway can now access conda commands without throwing errors (conda does not export functions other than the main conda function by default).

2.4.0 – (2020-05-22) Added beta-rarefaction outputs during the main code block (diversity block) for the following metrics: 'euclidean' 'correlation' 'weighted_normalized_unifrac' 'seuclidean' 'braycurtis' 'unweighted_unifrac' 'sqeuclidean' 'generalized_unifrac' 'aitchison' 'matching' 'weighted_unifrac' 'jaccard'. The --p-iterations option is set to 20 to allow for greater certainty in the generated jackknife plots, and the --p-metric is set to upgma. Also added diversity bioenv analysis to both unweighted_unifrac and weighted_unifrac distances in optional_analyses.txt. Some more bugfixes to the sample-classifier block.

2.3.1 – (2020-05-21) Additional summarization added to the sample-classifier block, along with bugfixes.

2.3.0 – (2020-05-20) Added multiple sample-classifier commands, primarily classify-samples, regress-samples, and the nested cross validation variation for both as described in this tutorial. Also added visualization commands to sample-classifier outputs, along with confusion matrices, heatmap, and scatterplot visualizations. Added many options to optional_analyses.txt to allow for tuning of variables in each of these commands.

2.2.0 – (2020-05-18) Added the ability to correlate beta diversity measures (unweighted and weighted Unifrac distances) to continuous variables in the data using a Mantel test. Added table.qza and rep-seqs.qza filtering, based on filtered metadata files when using the option -F or --filter-tables. A couple of small bugfixes regarding filter-table looping and the logging functions.

2.1.4 – (2020-05-18) Changed the ANCOM analyses to be able to generate multiple analyses for different taxa levels in one run. The rerun block has been moved to execution after the main codeblock, meaning that rerunning alpha or beta analyses will no longer throw errors when run before sklearn is finished.

2.1.3 – (2020-05-11) Rewrote the logging system and cleaned up the code. Functions are now in place to direct outputs to stdout, stderr, logfiles, or stdout only when verbose is true. Combinations of these are also implemented, making the code much more transparent and removing clutter. Some bugfixes regarding some echo messages have been implemented.

2.1.2 – (2020-05-11) Rewrote the manifest file generation in bash, eliminating any reliance on python from waterway itself. Manifest file generation is also more flexible, and what patterns are unique to filenames containing forward or reverse reads can now be specified in config.txt. Multiple bugs relating to output text color have been fixed, and more logging text has been included (as always).

2.1.1 – (2020-05-07) ANCOM analyses filenames now include the level of taxa the feature table was collapsed to, allowing for multiple ANCOM runs to be made without needing to manually rename files.

2.1.0 – (2020-05-06) Added color-coded output for all waterway messages (Qiime2 messages have been left with default colors). Filenames and filepaths are magenta, commands are cyan, success messages are green, errors are red, and warnings in yellow. Logfiles do not show color-coded output.

2.0.0 – (2020-04-29) Major update to waterway, with two new added analyses (Picrust2 and DEICODE), changes and streamlining to the config.txt file, improvements to both the -l (logging) and -v (verbose) options, and a ton of bugfixes. The Picrust2 and DEICODE plugins to Qiime2 must be manually installed first before allowing the commands to be run in optional_analyses.txt. The Picrust2 plugin workflow follows the workflow recommended in the plugin tutorial, as does the DEICODE workflow.

The config file has also had a few changes. Because the number entered into alpha_depth and sampling_depth were identical or similar numbers by design, alpha_depth has been removed, and sampling_depth is now used in the alpha rarefaction depth instead. The last qzaoutput variable in the config.txt file has now been removed from config.txt and has been put into the main codebase in waterway instead, removing the possibility of accidentally modifying it and breaking waterway. Variables needed for both Picrust2 and DEICODE analyses have been added to optional_analyses.txt.

Exit clauses have been removed from all optional analyses, which means that multiple optional analyses can be run at once. waterway now can recognize the Qiime2 environment’s version number, which is now displayed with the waterway version number when the -n (version) option is supplied. Some code framework has been added to make coding some other options easier in the future. waterway exit codes have been reworked to actually make sense (different problems have specified exit codes instead of randomly assigning exit codes based on my mood that day), and are described in waterway_exit_codes.txt, which has been added to the github folder. Many more outputs can be seen on stdout when the -l (logging) option is supplied, allowing for easier tracking of waterway pipeline progression. More data and variable states are displayed with the -v (verbose) option when supplied, allowing for easier debugging when combined with the -l (logging) option.

waterway also now checks that Qiime2 commands are available when executed, and exits while advising the user to activate the Qiime2 environment first if Qiime2 commands are not currently available. Bugfixes were made to multiple loops in the rerun blocks, as well as some broken logging statements. The stability of manifest file generation using the -M option was improved, along with the hyphen replace -r option. Updated the functions used documentation to reflect all changes, and added the Picrust2 and DEICODE analyses functions.

1.5.0 – (2020-04-24) Added the ability to replace underscores with hyphens using the -r option (useful for changing filenames to Cassava 1.8 format if Sample ID was inputted with underscores), specific to certain patterns in patterns_to_replace.txt (which will be generated on first running the r command). Major bugfixes to the manifest file generation -M option now means that it outputs the proper manifest.tsv file with non-broken filenames. Added flexibility to filepath inputs for the variables filepath, projpath, and qzaoutput so that it doesn’t matter whether they end with a slash or not. Updated the test output. A couple of other smaller bugfixes regarding the logging l option, and a bit of code was inserted to get ready for single-end analysis support.

1.4.3 – (2020-04-22) Added ability to run fastQC and multiQC through the use of the -F option. The fastQC and multiQC are run on the raw files located in the filepath, and are

1.4.2 – (2020-03-12) Major bugfixes to the beta diversity rerun step. Beta diversity reruns now output all reruns into a new folder called beta_div_rerun, located in the outputs/truncF-truncR/ directory.

1.4.1 – (2020-03-10) Bugfixes to the manifest file creation step.

1.4.0 – (2020-03-06) Added the ability to create manifest files through usage of the -M command. Users can now change the type of manifest format through the manifest_format variable in config.txt, and bugfixes to alpha diversity analysis.

1.3.1 – (2020-03-05) Minor bugfixes to the DADA2 visualization step.

1.3.0 – (2020-03-04) waterway now parses the first argument provided as the directory containing config/optional_analyses.txt, allowing users to separate out different config files in different folders without copy/pasting waterway.bash into each one. If no argument is provided, waterway uses the current working directory instead. Also makes it easier when waterway is specified as an alias in .bashrc. Now can rerun beta and ANCOM analyses over multiple groups at once, instead of having to rerun them over and over. Bugfixes to both beta and ANCOM analyses blocks, and updates to the optional_analyses.txt file.

1.2.4 – (2020-02-22) Vital bugfixes for PCoA biplot analyses, and some small echo fixes.

1.2.3 – (2020-02-22) Included both ANCOM and PCoA biplot analyses under the optional analyses section, and updated this website which was quite out of date with all the new additions.

1.2.2 – (2020-02-21) Included the option to rerun alpha rarefaction, and included an optional gneiss gradient-correlation analysis step after the main codeblock is ran. To reflect these changes and the added analyses’ non-mandatory nature, analysis_to_rerun.txt was changed to optional_analyses.txt. Further improved the -c classifier training option usability: waterway now detects whether the qza files exist in the script directory at every step, allowing the classifier training to start midway, saving time. Many improvements in echo clarity, including improved -v verbose output. Vital bugfixes to the core-metrics-phylogenetic codeblock, so it actually works now.

1.2.1 – (2020-02-19) Various bugfixes relating to analysis_to_rerun.txt.

1.2.0 – (2020-02-19) Added beta diversity analysis to the diversity analysis block. You can now only rerun the beta diversity analysis through setting the variable to true in the newly created file: analysis_to_rerun.txt. Also added the -n or –version option to print.

1.1.1 – (2020-02-15) When the “-c” option is used with waterway, you no longer need to manually change the classifierpath variable in the config file. waterway now automatically resets the variable to point to the newly created classifierpath, no matter whether created in the script directory or at greengenes_path.

1.1.0 – (2020-02-09) Added the “-c” option, allowing waterway to automatically train a classifier for use with sk_learn. waterway now automatically detects whether you already have the prerequisite files to train the classifier with (greengenes database files), and downloads them automatically if you don’t. The sourcefile has now been named config.txt for clarity. The masterfile requirement was removed, with waterway now directly sourcing the config file. Other assorted bugfixes including some echo message fixes, a crash-inducing bug at the dada2 step, and many classifier-induced bugs.

1.0.1 – (2020-01-26) Changed “alpha_depth” and “max_depth” breakpoint from before tree generation to after tree generation, and fixed some echo messages being where they weren’t supposed to.

1.0.0 – (2020-01-22) Original commit.