Execution Flags
These are the flags for the execution process of EnTAP. These will be used via the command line (denoted CMD), entap_run.params file (denoted R-ini), or entap_config.ini file (denoted E-ini). Since there are required and recommended flags, these will be repeated throughout the other categories where relevant. All commands may be used through the command line in addition to the recommended usage.
There are a few data types (qualifiers) to keep in mind used throughout these ini files. Anything that is specifies as a ‘multi’ type means that the parameter may be entered multiple times. If it is in an ini file, each parameters must be separated by a comma (‘,’). Example for multi-integer:”1,2,3” (entered without quotes)
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
run |
Execute the main EnTAP annotation functionality including, but not limited to, Similarity Searching and Gene Family analysis. Optional Frame Selection, Expression Analysis, and HGT analysis can be added with the appropriate flags. |
cmd |
flag |
run |
input / i |
Path to the transcriptome file (either nucleotide or protein) |
R-ini |
string |
/path/to/input/transcriptome.fa |
database / i |
Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against |
R-ini |
multi-string |
/path/to/diamond/database.dmnd |
run-ini |
Point to the |
cmd |
string |
/path/to/entap_run.params |
entap-ini |
Point to the |
cmd |
string |
/path/to/entap_config.ini |
entap-db-bin |
Path to the |
E-ini |
string |
/path/to/entap_database.bin |
diamond-exe |
Specify the execution method for DIAMOND. This can be a path to the |
E-ini |
string |
diamond |
eggnog-map-data |
Path to the directory containing the EggNOG SQL database |
E-ini |
string |
/path/to/eggnog_db_directory |
eggnog-map-dmnd |
Path to the EggNOG DIAMOND configured database |
E-ini |
string |
/databases/eggnog_proteins.dmnd |
eggnog-map-exe |
Path to the EggNOG-mapper executable, or method of execution. If installed globally, this is simply |
E-ini |
string |
emapper.py |
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
run |
Execute the main EnTAP annotation functionality including, but not limited to, Similarity Searching and Gene Family analysis. Optional Frame Selection, Expression Analysis, and HGT analysis can be added with the appropriate flags. |
cmd |
flag |
run |
input / i |
Path to the transcriptome file (either nucleotide or protein) |
R-ini |
string |
/path/to/input/transcriptome.fa |
database / d |
Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against |
R-ini |
multi-string |
/path/to/diamond/database.dmnd |
run-ini |
Point to the |
cmd |
string |
/path/to/entap_run.params |
entap-ini |
Point to the |
cmd |
string |
/path/to/entap_config.ini |
out-dir |
Specify an output directory for all of the files generated by EnTAP |
R-ini |
string |
/path/to/entap_output |
frame_selection |
Specify if you would like to perform frame selection/gene prediction on your input nucleotide transcriptome. This flag will be ignored if you input protein sequences. If this is set to false with nucleotide input, EnTAP will perform blastx functionality where appropriate. |
R-ini |
bool |
true |
entap-db-bin |
Path to the |
E-ini |
string |
/path/to/entap_database.bin |
entap-graph |
Path to the |
E-ini |
string |
/path/to/entap_graphing.py |
diamond-exe |
Specify the execution method for DIAMOND. This can be a path to the |
E-ini |
string |
diamond |
eggnog-map-data |
Path to the directory containing the EggNOG SQL database |
E-ini |
string |
/path/to/eggnog_db_directory |
eggnog-map-dmnd |
Path to the EggNOG DIAMOND configured database |
E-ini |
string |
/databases/eggnog_proteins.dmnd |
eggnog-map-exe |
Path to the EggNOG-mapper executable, or method of execution. If installed globally, this is simply |
E-ini |
string |
emapper.py |
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
run |
Execute the main EnTAP annotation functionality including, but not limited to, Similarity Searching and Gene Family analysis. Optional Frame Selection, Expression Analysis, and HGT analysis can be added with the appropriate flags. |
cmd |
flag |
run |
input / i |
Path to the transcriptome file (either nucleotide or protein) |
R-ini |
string |
/path/to/input/transcriptome. |
database / d |
Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against |
R-ini |
multi-string |
/path/to/diamond/database.dmnd |
run-ini |
Point to the |
cmd |
string |
/path/to/entap_run.params |
entap-ini |
Point to the |
cmd |
string |
/path/to/entap_config.ini |
out-dir |
Specify an output directory for all of the files generated by EnTAP |
R-ini |
string |
/path/to/entap_output |
overwrite |
All previously ran files will be overwritten. Without this flag, EnTAP will recognize the results from the previous run and skip executing the portions that were already ran |
R-ini |
bool |
true |
resume |
Set this flag to TRUE to if you would like EnTAP to continue execution if files from a previous run are found. If set to FALSE, EnTAP will stop execution after files from a previous run are detected. Note, ‘overwrite’ flag supersedes this - for example if ‘overwrite’ is set to TRUE and ‘resume’ set to FALSE with files from a previous run detected, they will be deleted and execution will continue. |
R-ini |
bool |
false |
graph |
Specifying this will check whether or not your system has graphing functionality supported and then exit. System will need Python installed with the Matplotlib module. |
cmd |
bool |
true |
threads / t |
Specify the number of threads for execution |
R-ini |
integer |
5 |
no-trim |
By default, EnTAP will trim your sequence headers to the first space to maintain compatbility across different software. Using this flag will instead retain the information of the header by removing all spaces.
|
R-ini |
bool |
true |
state |
Precise control over Execution of EnTAP. This flag allows for certain parts to be ran while skipping others. More information can be seen in the Execution section. |
R-ini |
string |
‘+’ |
version |
Prints the current EnTAP version you are running |
cmd |
flag |
version |
no-check |
EnTAP checks execution paths and inputs prior to annotating to prevent finding out your input was wrong until midway through a run. Using this flag will eliminate the check (not advised to use!) |
R-ini |
bool |
true |
output-format |
Specify multiple output file formats for each stage of the pipeline
|
R-ini |
multi-integer |
1,3,4,7 |
entap-db-sql |
Path to the |
E-ini |
string |
/path/to/entap_database.db |
entap-db-bin |
Path to the |
E-ini |
string |
/path/to/entap_database.bin |
entap-graph |
Path to the |
E-ini |
string |
/path/to/entap_graphing.py |
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
align / a |
Path to the alignment file (either SAM or BAM format). Ignoring this flag will skip expression filtering. Be sure to look at the other Expression Analysis flags if using this. |
R-ini |
string |
/path/to/alignment.bam |
rsem-calculate-expression |
Specify the path to the |
E-ini |
string |
/libs/RSEM-1.3.3/rsem-calculate-expression |
rsem-sam-validator |
Specify the path to the |
E-ini |
string |
/libs/RSEM-1.3.3/rsem-sam-validator |
rsem-prepare-reference |
Specify the path to the |
E-ini |
string |
/libs/RSEM-1.3.3/rsem-prepare-reference |
rsem-convert-sam-for-rsem |
Specify the path to the |
E-ini |
string |
/libs/RSEM-1.3.3/rsem-convert-sam-for-rsem |
fpkm |
Specify the FPKM (fragments per kilobase of exon per million mapped fragments) cutoff for Expression Filtering. All transcripts below this number will be filtered out and removed. |
R-ini |
float |
0.5 |
single-end |
Signify your reads are single-end for RSEM execution instead of paired-end (default) |
R-ini |
bool |
true |
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
frame_selection |
Specify if you would like to perform frame selection/gene prediction on your input nucleotide transcriptome. This flag will be ignored if you input protein sequences. If this is set to false with nucleotide input, EnTAP will perform blastx functionality where appropriate. |
R-ini |
bool |
true |
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
transdecoder-m |
Specify the minimum protein length for TransDecoder |
R-ini |
integer |
100 |
transdecoder-no-refine-starts |
Specify this flag if you would like to pipe the TransDecoder command ‘–no_refine_starts’ when it is executed |
R-ini |
bool |
false |
transdecoder-long-exe |
Method to execute TransDecoder.LongOrfs. This may be the path to the executable, or simply, ‘TransDecoder.LongOrfs’ if installed globally |
E-ini |
string |
TransDecoder.LongOrfs |
transdecoder-predict-exe |
Method to execute TransDecoder.Predict. This may be the path to the executable, or simply, ‘TransDecoder.Predict’ if installed globally |
E-ini |
string |
TransDecoder.Predict |
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
database / d |
Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against |
R-ini |
multi-string |
/path/to/diamond/database.dmnd |
data-type |
|
R-ini |
integer |
0 |
contam / c |
Specify contaminants to be used during Simlilarity Search best hit selection. Contaminants can be selected by species or through a specific taxon (insecta) from the NCBI Taxonomy Database. If your taxon is more than one word just replace the spaces with underscores (_). Alignments will be flagged as contaminants and will be lower scoring compared to other alignments. |
R-ini |
multi-string |
insecta |
taxon |
This flag will allow for ‘taxonomic favoring’ of hits that are closer to your target species or lineage. Any lineage can be used as referenced by the NCBI Taxonomic database, such as genus, phylum, or species. Format must replace all spaces with underscores (‘_’) |
R-ini |
string |
homo_sapiens |
e |
Specify E-value cutoff for Similarity Searching results (in scientific notation format). |
R-ini |
scientific |
10E-5 |
tcoverage |
Specify minimum target coverage for similarity searching |
R-ini |
float |
50 |
qcoverage |
Specify minimum query coverage for similarity searching |
R-ini |
float |
50 |
uninformative |
Comma-deliminated list of terms you would like to be deemed “uninformative”. Any alignments during Similarity Searching tagged as uninformative will be scored lower |
R-ini |
string |
conserved, predicted, unnamed, hypothetical, putative, unidentified, uncharacterized, unknown, uncultured, uninformative |
diamond-exe |
Specify the execution method for DIAMOND. This can be a path to the |
E-ini |
string |
diamond |
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
diamond-sensitivity |
Specify the DIAMOND sensitivity used against input DIAMOND databases (Similarity Searching and HGT Analysis). Sensitivities are based off of DIAMOND documentation with a higher sensitivity generally taking longer but giving a higher alignment rate. Sensitivity options are fast, mid-sensitive, sensitive, more-sensitive, very-sensitive, ultra-sensitive. |
R-ini |
string |
very-sensitive |
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
ontology_source |
|
R-ini |
multi-integer |
0 |
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
eggnog-map-data |
Path to the directory containing the EggNOG SQL database |
E-ini |
string |
/path/to/eggnog_db_directory |
eggnog-map-dmnd |
Path to the EggNOG DIAMOND configured database |
E-ini |
string |
/databases/eggnog_proteins.dmnd |
eggnog-map-exe |
Path to the EggNOG-mapper executable, or method of execution. If installed globally, this is simply |
E-ini |
string |
emapper.py |
eggnog-contaminant |
Specify this to turn on/off EggNOG contaminant analysis (on by default). This leverages the taxon input from the contaminant Similarity Search command to determine if an EggNOG annotation should be flagged as a contaminant. EggNOG contaminant analysis can only be performed alongside Similarity Search contaminant analysis (not on its own) and will only be utilized if no alignments were found for a given transcript during Similarity Searching |
R-ini |
bool |
true |
eggnog-dbmem |
Specify this to use the ‘–dbmem’ flag with EggNOG-mapper. This will load the entire eggnog.db sqlite3 database into memory which can require up to ~44GB of memory. However, this will significantly speed up EggNOG annotations |
R-ini |
bool |
true |
eggnog-sensitivity |
Specify the DIAMOND sensitivity used during EggNOG mapper execution against the EggNOG database. Sensitivities are based off of DIAMOND documentation with a higher sensitivity generally taking longer but giving a higher alignment rate. Sensitivity options are fast, mid-sensitive, sensitive, more-sensitive, very-sensitive, ultra-sensitive. |
R-ini |
string |
more-sensitive |
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
interproscan-db |
|
R-ini |
multi-string |
pfam |
interproscan-exe |
Specify the execution method for InterProScan. Commonly this can be the path to the |
E-ini |
string |
interproscan.sh |
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
hgt-donor |
Specify the DIAMOND configured (.dmnd extension) donor databases for Horizontal Gene Transfer analysis. Separate databases with a comma (‘,’) |
R-ini |
multi-string |
path/to/donor/database1.dmnd,path/to/donor/database2.dmnd |
hgt-recipient |
Specify the DIAMOND configured (.dmnd extension) recipient databases for Horizontal Gene Transfer analysis. Separate databases with a comma (‘,’) |
R-ini |
multi-string |
path/to/recipient/database1.dmnd,path/to/recipient/database2.dmnd |
hgt-gff |
Specify path to the GFF file for HGT analysis. The input GFF must satisfy the following:
|
R-ini |
string |
path/to/gff/file.gff |
diamond-exe |
Specify the execution method for DIAMOND. This can be a path to the |
E-ini |
string |
diamond |
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
api-taxon |
Check whether a species can be found within the specified EnTAP database. Returns a JSON formatted text indicating whether the species was found. Format must replace all spaces with underscores (‘_’) as follows: “- -taxon homo_sapiens” or “- -taxon primates” |
cmd |
string |
homo_sapiens |
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
ncbi-api-key |
Enter your personal NCBI API key, if available. This can be assigned to you through your NCBI account. Although not required, enabling EnTAP to use your API key will allow for much quicker accessions to the NCBI database. Your API key will only be used to access the NCBI database and only stored locally. If you are a contributor to EnTAP, DO NOT accidentally commit your API key to git! |
R-ini |
string |
n/a |
ncbi-api-enable |
Allow EnTAP to access the NCBI database API to pull additional information for your data during Similarity Searching if searching against a NCBI database. |
R-ini |
bool |
true |