ADToolbox Commandline Interface
Here we go over using the commandline interface, CLI, of ADToolbox. First, we need to initialize the CLI:
Initialization
After installing ADToolbox, type and execute the following in your terminal to initialize the base directory for ADToolbox files:
ADToolbox
No Base Directory Found:
Where do you want to store your ADToolBox Data?:
You can access all the commands along with their brief explanation by:
ADToolbox --help
ADToolbox Modules
This toolbox is comprised of different modules:
-
Configs Module
-
Database Module
-
Metagenomics Module
-
ADM Module
-
Documentations Module
-
Report Module
-
Utility Module
1. Configs Module
After installation, you have to download all the required files to run ADToolbox properly. To do this, first go to the Configs module by:
ADToolbox Configs --help
- set-base-dir: The first configuration command will allow you to set the base directory for ADToolbox to work. This could be an existing folder somewhere in your files or a directory that you are willing to create. If the directory does not already exit, it will be automatically created after this command. For example if I want to set the base directory to be ADToolbox directory on my desktop the command would be, in MacOS, something like this:
ADToolbox Configs set-base-dir -d ~/Desktop/ADToolbox
Note that you must include -d after set-base-dir
.
Anything that you will do from now on, will be saved in this directory.
- build-folder-structure: Now you need to build the folder structure that is understandable by ADToolbox. You will do that by:
ADToolbox Configs build-folder-structure
- download-all-databases: The next step is to download all the necessary database files for ADToolbox to work properly. You will achieve this by:
ADToolbox Configs download-all-databases
- download-escher-files: If you want to access the esher map functionalities of ADToolbox you need to download the required files by:
ADToolbox Configs download-escher-files
An overal view of the Configs module of ADToolbox can be obtained by:
ADToolbox Configs --help
────────────────────────────────── ADToolBox───────────────────────────────────
usage: ADToolBox Configs [-h]
{set-base-dir,build-folder-structure,download-all-datab
ases,download-escher-files}
...
positional arguments:
{set-base-dir,build-folder-structure,download-all-databases,download-escher-fi
les}
Available Configs Commands:
set-base-dir Determine the address of the base directory for
ADToolBox to work with
build-folder-structure
Builds the folder structure for ADToolBox to work
properly
download-all-databases
Downloads all the databases for ADToolBox to work
properly, and puts them in the right directory in
Databases
download-escher-files
Downloads all files required for running the escher
map functionality of ADToolBox
options:
-h, --help show this help message and exit
2. Database Module
Any database that is used by ADToolbox can be modified from this module. Type the following in your commandline to find all of the database module's commands:
ADToolbox Database --help
──────────────────────────── ADToolBox ────────────────────────────
usage: ADToolBox Database [-h]
{initialize-feed-db,extend-feed-db,show-f
eed-db,download-reaction-db,download-seed-reaction-db,build-protein
-db,download-feed-db,download-protein-db,download-amplicon-to-genom
e-dbs}
...
positional arguments:
{initialize-feed-db,extend-feed-db,show-feed-db,download-reaction
-db,download-seed-reaction-db,build-protein-db,download-feed-db,dow
nload-protein-db,download-amplicon-to-genome-dbs}
Database Modules:
initialize-feed-db Initialize the Feed DB
extend-feed-db Extend the Feed Database using a CSV file
show-feed-db Display the Current Feed Database
download-reaction-db
Downloads the reaction database in CSV
format
download-seed-reaction-db
Downloads the seed reaction database in
JSON format
build-protein-db Generates the protein database for
ADToolbox
download-feed-db Downloads the feed database in JSON
format
download-protein-db
Downloads the protein database in fasta
format; You can alternatively build it
from reaction database.
download-amplicon-to-genome-dbs
downloads amplicon to genome databases
options:
-h, --help show this help message and exit
We now go over these commands one by one:
- initialize-feed-db: This will create an empty JSON file in the Database sub-directory in your base directory that will hold all the future feed information that you add. You can run this command by:
ADToolbox Database initialize-feed-db
- extend-feed-db: you can add to your current feed database from a csv file with this command. The CSV file must follow this column configuration:
Name | TS | TSS | Lipids | Proteins | Carbohydrates | PI | SI | Notes |
---|---|---|---|---|---|---|---|---|
You can add your CSV file by:
ADToolbox Database extend-feed-db -d [PATH TO YOUR CSV FILE]
- show-feed-db: You can pretty print your current feed table by
ADToolbox Database show-feed-db
This command will print a pretty table of your current feed database, which is the JSON file you initialized earlier and extended with a CSV file. This is made possible because of the great Rich library.
NOTE: Skipp the following download commands if you ran ADToolbox Configs download-all-databases
- download-reaction-db: As the name implies, this will download the ADToolbox reaction database. This is required for many important modules of the toolbox
ADToolbox Database download-reaction-db
- download-feed-db: Downloads the default feed database in JSON format
ADToolbox Database download-feed-db
- download-protein-db: Downloads the protein database in fasta format; You can alternatively build it from reaction database if you have downloaded it; Check below.
ADToolbox Database download-protein-db
- build-protein-db: Generates the protein database for ADToolbox from the reaction database:
ADToolbox Database build-protein-db
- download-amplicon-to-genome-dbs: If you need to use the 16s mapping to the protein database and ADM, you will need to download the required databases using this command:
ADToolbox Database download-amplicon-to-genome-dbs
- download-seed-reaction-db: This will download the SEED reaction database in JSON format.
ADToolbox Database download-seed-reaction-db
3. Metagenomics Module
Metagenomics module of ADToolbox is designed to add metagenomics data into considewration when designing and AD process.
You can observe all the functionalities by:
ADToolbox Metagenomics --help
──────────────────────────── ADToolBox ────────────────────────────
usage: ADToolBox Metagenomics [-h]
{amplicon-to-genome,align-genomes,mak
e-json-from-genomes,map-genomes-to-adm}
...
positional arguments:
{amplicon-to-genome,align-genomes,make-json-from-genomes,map-geno
mes-to-adm}
Available Metagenomics Commands:
amplicon-to-genome Downloads the representative genome from
each amplicon
align-genomes Align Genomes to the protein database of
ADToolbox, or any other fasta with
protein sequences
make-json-from-genomes
Generates JSON file required by Align-
Genomes for custom genomes.
map-genomes-to-adm maps JSON file with genome infromation to
ADM reactions
options:
-h, --help show this help message and exit
As of right now, there are 3 main submodules exist in the Metagenomics module:
- amplicon-to-genome: If you have 16s Data from QIIME, you can, hopefully, find the representative genomes for each replicon in an automated way using this functionality by selecting different parameters:
ADToolbox Metagenomics amplicon-to-genome --help
────────────────────────────────── ADToolBox ───────────────────────────────────
usage: ADToolBox Metagenomics amplicon-to-genome [-h] [-q QIIME_OUTPUTS_DIR]
[-f FEATURE_TABLE_DIR]
[-r REP_SEQ_DIR]
[-t TAXONOMY_TABLE_DIR]
[-o OUTPUT_DIR]
[-a AMPLICON_TO_GENOME_DB]
[--k K]
[--similarity SIMILARITY]
options:
-h, --help show this help message and exit
-q QIIME_OUTPUTS_DIR, --qiime-outputs-dir QIIME_OUTPUTS_DIR
Input the directory to the QIIME outputs
-f FEATURE_TABLE_DIR, --feature-table-dir FEATURE_TABLE_DIR
Input the directory to the feature table output from
QIIME output tables
-r REP_SEQ_DIR, --rep-Seq-dir REP_SEQ_DIR
Input the directory to the repseq fasta output from
QIIME output files
-t TAXONOMY_TABLE_DIR, --taxonomy-table-dir TAXONOMY_TABLE_DIR
Input the directory to the taxonomy table output from
QIIME output files
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
Output the directory to store the representative
genome files
-a AMPLICON_TO_GENOME_DB, --amplicon-to-genome-db AMPLICON_TO_GENOME_DB
The Amplicon to Genome Database to use
--k K Top k genomes to be selected
--similarity SIMILARITY
Similarity cutoff in the 16s V4 region for genome
selection
If you do not provide any arguents, the default directories and values will be used. By default, ADToolbox looks at
your base directory, that you set in Configs, in :Metagenomics_Data/QIIME_Outputs
.
-
If you want to use your default arguments, you need to provide the directory to Feature table, taxonomy table, repseq fasta file from QIIME.
-
If you do not want to use the amplicon to genome database that you downloaded in Configs or Database modules, you can point to the database directory of your interest as well by
--amplicon-to-genome-db
or-a
-
From each sample you can choose the top k abundant taxa to be selected for downloading genome, for instance
--k 10
will select the top 10 taxa from each sample -
When selecting genomes you need a precentage of similarity cutoff to select a representaive genome from GTDB. You do this by
--similarity 96
. This will set 96% as a similarity cutoff. -
Finally, you need to provide the directory where you want the representative genomes to be saved. Additionally, a few extra files providing information about the fetched genomes will be saved here:
-
GenomeAccessions.csv
provides the NCBI accession IDs of the found genomes. SelectedFeatures.csv
provides the feature IDs, hashes, from QIIME outputs for the fetched genomes.TopKTaxa.csv
taxonomy name of the replicons for which a genome has been found.Amplicon2Genome_OutInfo.json
important This JSON file is a metadata about the genomes that were found. This is later used to align these genomes to the protein database.
A complete amplicon-to-genome command will look like:
ADToolbox Metagenomics amplicon-to-genome -f ~/Desktop/test/feature-table.tsv \
-r ~/Desktop/test/dna-sequences.fasta \
-t ~/Desktop/test/taxonomy.tsv \
-o ~/Desktop/test/ --k 10 \
-q ~/Desktop/test/ \
-a ~/Desktop/ADToolbox/Database/Amplicon2GenomeDBs/
- align-genomes This submodule uses MMseqs to align a list of genomes to the protein database of ADToolbox for functional analysis.
You can run this submodule by the following arguments:
ADToolbox Metagenomics align-genomes --help
────────────────────────────────── ADToolBox ───────────────────────────────────
usage: ADToolBox Metagenomics align-genomes [-h] [-i INPUT_FILE]
[-d PROTEIN_DB_DIR]
[-o OUTPUT_DIR] [-b BIT_SCORE]
[-e E_VALUE]
options:
-h, --help show this help message and exit
-i INPUT_FILE, --input-file INPUT_FILE
Input the address of the JSON file includeing
information about the genomes to be aligned
-d PROTEIN_DB_DIR, --protein-db-dir PROTEIN_DB_DIR
Directory containing the protein database to be used
for alignment
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
Output the directory to store the alignment results
-b BIT_SCORE, --bit-score BIT_SCORE
Minimum Bit Score cutoff for alignment
-e E_VALUE, --e-value E_VALUE
Minimum e-vlaue score cutoff for alignment
-
input-file: you need to give a JSON file similar to what you create in amplicon-to-genome, see previous submodule, so that ADToolbox finds all the information it needs about the genomes
-
protein-db-dir: you can provide ADToolboxes protein database fasta file, or any protein database that you would like to align your genomes with.
- output-dir: Describes the location to save the alignment results.
- bits-core: Minimum bit score to filter out the alignment results.
- e_value: Minimum bit score to filter out the alignment results.
The outputs of this step includes one more import file:
Alignment_Info.json
-> This file is used by map-genomes-to-adm
A complete command for this submodule would looklike:
ADToolBox Metagenomics align-genomes \
--input-file ~/Desktop/ADToolbox/Genomes/Amplicon2Genome_OutInfo.json
--protein-db-dir ~/Desktop/ADToolbox/Database/Protein_DB.fasta
--output-dir ~/Desktop/ADToolbox/Outputs/ \
--bit-score 40 \
--e-value 0.000001
- make-json-from-genomes Sometimes you have genomes either from assembly or downloading it manually. In this case you can import your genomes
to the pipeline by making a JSON file similar to
Amplicon2Genome_OutInfo.json
. To this you need to make a CSV file for your genomes:
ADToolbox Metagenomics make-json-from-genomes --help
────────────────────────────────── ADToolBox ───────────────────────────────────
usage: ADToolBox Metagenomics make-json-from-genomes [-h] -i INPUT_FILE -o
OUTPUT_FILE
options:
-h, --help show this help message and exit
-i INPUT_FILE, --input-file INPUT_FILE
Input the address of the CSV file includeing
information about the genomes to be aligned
-o OUTPUT_FILE, --output-file OUTPUT_FILE
Output the directory to store the JSON file.
- input-file: This input file should be the address to a CSV file exactly in the following column format:
Genome_ID | NCBI_Name | Genome_Dir |
---|---|---|
Genome1 | xyz | ~/Desktop/... |
1- Genome_ID: Identifier for the genome, preferrably; NCBI ID 2- NCBI_Name: NCBI taxonomy name for the genome: Does not need to be in a specific format 3- Genome_Dir: Absolute path to the fasta files: NOT .gz
- output-file: output directory for the generated JSON to be saved.
An example of this command would be:
ADToolBox Metagenomics make-json-from-genomes --input-file ~/Desktop/MyGenomes.CSV --output-files ~/Desktop/Genome_info.json
- map-genomes-to-adm: This command will take a JSON file that has Alignment info for all of the genomes, and will output the mapping to ADM models. This is how you run this command:
ADToolbox Metagenomics map-genomes-to-adm --help
────────────────────────────────── ADToolBox ───────────────────────────────────
usage: ADToolBox Metagenomics map-genomes-to-adm [-h] [-i INPUT_FILE]
[-m MODEL] [-o OUTPUT_DIR]
options:
-h, --help show this help message and exit
-i INPUT_FILE, --input-file INPUT_FILE
Input the address of the JSON file includeing
information about the alignment of the genomes to the
protein database
-m MODEL, --model MODEL
Model determines which mapping system you'd like to
use; Current options: 'Modified_ADM_Reactions'
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
address to store the JSON report to be loaded with a
model
An example of a complete command for this submodule is:
ADToolbox Metagenomics map-genomes-to-adm -i ~/Desktop/alignment_info.json -m Modified_ADM_Reactions -o ~/Desktop/ADM_Mapping_Report.json
4. ADM Module
ADM module provides all the tools needed to run instances of ADM Model. This include the originsl ADM, Batstone et al., and the Modified-ADM suggested by the Authors of ADToolbox. In order to find out about all the functionalities in this module, you can run:
ADToolbox ADM --help
────────────────────────────────── ADToolBox ───────────────────────────────────
usage: ADToolBox ADM [-h] {original-adm1,modified-adm,show-escher-map} ...
positional arguments:
{original-adm1,modified-adm,show-escher-map}
Available ADM Commands:
original-adm1 Original ADM1:
modified-adm Modified ADM:
show-escher-map makes an escher map for modified ADM
options:
-h, --help show this help message and exit
- original-adm1: If you want to run the original ADM, batstone et al, in your browser you can run this command with the required parameters in JSON format:
ADToolbox ADM original-adm1 --help
────────────────────────────────── ADToolBox ───────────────────────────────────
usage: ADToolBox ADM original-adm1 [-h] [--model-parameters MODEL_PARAMETERS]
[--base-parameters BASE_PARAMETERS]
[--initial-conditions INITIAL_CONDITIONS]
[--inlet-conditions INLET_CONDITIONS]
[--reactions REACTIONS] [--species SPECIES]
[--metagenome-report METAGENOME_REPORT]
[--report REPORT]
options:
-h, --help show this help message and exit
--model-parameters MODEL_PARAMETERS
Model parameters for ADM 1
--base-parameters BASE_PARAMETERS
Provide json file with base parameters for original
ADM1
--initial-conditions INITIAL_CONDITIONS
Provide json file with initial conditions for original
ADM1
--inlet-conditions INLET_CONDITIONS
Provide json file with inlet conditions for original
ADM1
--reactions REACTIONS
Provide json file with reactions for original ADM1
--species SPECIES Provide json file with species for original ADM1
--metagenome-report METAGENOME_REPORT
Provide json file with metagenome report for original
ADM1
--report REPORT Describe how to report the results of original ADM1.
Current options are: 'dash' and 'csv'
Every argument is optional, and their role is clear from the comments in front of them, So we just provide a full example of this command:
ADToolbox ADM original-adm1 \
--model-parameters ~/Desktop/Model_Parameters.json \
--base-parameters ~/Desktop/Base_Parameters.json \
--initial-conditions ~/Desktop/Initial_Conditions.json \
--inlet-conditions ~/Desktop/Inlet-Conditions.json \
--reactions ~/Desktop/Reactions.json
--species ~/Desktop/Species.json
--metagenome-report ~/Desktop/ADM_Mapping_Report.json
--repor dash
- modified-adm: This command is exactly similar to the previous one, except that it requires parameters taylored for modified ADM:
ADToolbox ADM modified-adm --help
────────────────────────────────── ADToolBox ───────────────────────────────────
usage: ADToolBox ADM modified-adm [-h] [--model-parameters MODEL_PARAMETERS]
[--base-parameters BASE_PARAMETERS]
[--initial-conditions INITIAL_CONDITIONS]
[--inlet-conditions INLET_CONDITIONS]
[--reactions REACTIONS] [--species SPECIES]
[--metagenome-report METAGENOME_REPORT]
[--report REPORT]
options:
-h, --help show this help message and exit
--model-parameters MODEL_PARAMETERS
Model parameters for Modified ADM
--base-parameters BASE_PARAMETERS
Provide json file with base parameters for modified
ADM
--initial-conditions INITIAL_CONDITIONS
Provide json file with initial conditions for modified
ADM
--inlet-conditions INLET_CONDITIONS
Provide json file with inlet conditions for modified
ADM
--reactions REACTIONS
Provide json file with reactions for modified ADM
--species SPECIES Provide json file with species for modified ADM
--metagenome-report METAGENOME_REPORT
Provide json file with metagenome report for modified
ADM
--report REPORT Describe how to report the results of modified ADM.
Current options are: 'dash' and 'csv'
The usage is exactly the same as the original-adm
- show-escher-map: This command will prompt you to open an escher map for the modified-adm model in your browser with the instructed address:
ADToolbox ADM show-escher-map
5. Documentations Module
You can view the documentaion in your CLI using rich's markdown render. You can do this by:
ADToolbox Documentations --show