Execution Flags
These are the flags for the execution process of EnTAP. Flags will be repeated throughout these categories where relevant. All flags will be used via the command line (denoted cmd) or EnTAP ini file (denoted ini).
param |
description |
location (cmd/ini) |
qualifier |
example |
---|---|---|---|---|
runP or runN |
Specify blastp or blastx annotation. If - -runP is selected with a nucleotide input, frame selection will be ran and annotation stages will be executed with protein sequences (blastp). If - -runP is selected with a protein input, frame selection will not be ran and annotation will be executed with protein sequences (blastp). If - -runN is selected with nucleotide input, frame selection will not be ran and annotation will be executed with nucleotide sequences (blastx) |
cmd |
flag |
runP |
input / i |
Path to the transcriptome file (either nucleotide or protein) |
cmd |
string |
/path/to/input/transcriptome.fa |
database / i |
Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against |
cmd |
multi-string |
/path/to/diamond/database.dmnd |
ini |
Point to the entap_config.ini to specify paths and commands needed by EnTAP |
cmd |
string |
/path/to/entap_config.ini |
param |
description |
location (cmd/ini) |
qualifier |
example |
---|---|---|---|---|
runP or runN |
Specify blastp or blastx annotation. If - -runP is selected with a nucleotide input, frame selection will be ran and annotation stages will be executed with protein sequences (blastp). If - -runP is selected with a protein input, frame selection will not be ran and annotation will be executed with protein sequences (blastp). If - -runN is selected with nucleotide input, frame selection will not be ran and annotation will be executed with nucleotide sequences (blastx) |
cmd |
flag |
runP |
input / i |
Path to the transcriptome file (either nucleotide or protein) |
cmd |
string |
/path/to/input/transcriptome.fa |
database / d |
Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against |
cmd |
multi-string |
/path/to/diamond/database.dmnd |
ini |
Point to the entap_config.ini to specify paths and commands needed by EnTAP |
cmd |
multi-string |
/path/to/entap_config.ini |
out-dir |
Specify an output directory for all of the files generated by EnTAP |
cmd |
string |
/path/to/entap_output |
param |
description |
location (cmd/ini) |
qualifier |
example |
---|---|---|---|---|
runP or runN |
Specify blastp or blastx annotation. If - -runP is selected with a nucleotide input, frame selection will be ran and annotation stages will be executed with protein sequences (blastp). If - -runP is selected with a protein input, frame selection will not be ran and annotation will be executed with protein sequences (blastp). If - -runN is selected with nucleotide input, frame selection will not be ran and annotation will be executed with nucleotide sequences (blastx) |
cmd |
flag |
runP |
input / i |
Path to the transcriptome file (either nucleotide or protein) |
cmd |
string |
/path/to/input/transcriptome. |
database / d |
Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against |
cmd |
multi-string |
/path/to/diamond/database.dmnd |
ini |
Point to the entap_config.ini to specify paths and commands needed by EnTAP |
cmd |
string |
/path/to/entap_config.ini |
out-dir |
Specify an output directory for all of the files generated by EnTAP |
cmd |
string |
/path/to/entap_output |
overwrite |
All previously ran files will be overwritten. Without this flag, EnTAP will recognize the results from the previous run and skip executing the portions that were already ran |
cmd |
flag |
overwrite |
graph |
Specifying this will check whether or not your system has graphing functionality supported and then exit. System will need Python installed with the Matplotlib module. |
cmd |
flag |
graph |
threads / t |
Specify the number of threads for execution |
cmd |
integer |
5 |
no-trim |
By default, EnTAP will trim your sequence headers to the first space to maintain compatbility across different software. Using this flag will instead retain the information of the header by removing all spaces.
|
cmd |
flag |
no-trim |
state |
Precise control over Execution of EnTAP. This flag allows for certain parts to be ran while skipping others. More information can be seen in the Execution section. |
cmd |
string |
‘+’ |
version |
Prints the current EnTAP version you are running |
cmd |
flag |
version |
no-check |
EnTAP checks execution paths and inputs prior to annotating to prevent finding out your input was wrong until midway through a run. Using this flag will eliminate the check (not advised to use!) |
cmd |
flag |
no-check |
output-format |
Specify multiple output file formats for each stage of the pipeline
|
ini |
multi-integer |
1,3,4,7 |
param |
description |
location (cmd/ini) |
qualifier |
example |
---|---|---|---|---|
align / a |
Path to the alignment file (either SAM or BAM format). Ignoring this flag will skip expression filtering. Be sure to look at the other Expression Analysis flags if using this. |
cmd |
string |
/path/to/alignment.bam |
rsem-calculate-expression |
Specify the path to the |
ini |
string |
/libs/RSEM-1.3.3/rsem-calculate-expression |
rsem-sam-validator |
Specify the path to the |
ini |
string |
/libs/RSEM-1.3.3/rsem-sam-validator |
rsem-prepare-reference |
Specify the path to the |
ini |
string |
/libs/RSEM-1.3.3/rsem-prepare-reference |
rsem-convert-sam-for-rsem |
Specify the path to the |
ini |
string |
/libs/RSEM-1.3.3/rsem-convert-sam-for-rsem |
fpkm |
Specify the FPKM (fragments per kilobase of exon per million mapped fragments) cutoff for Expression Filtering. All transcripts below this number will be filtered out and removed. |
ini |
float |
0.5 |
single-end |
Signify your reads are single-end for RSEM execution instead of paired-end (default) |
ini |
bool |
single-end |
param |
description |
location (cmd/ini) |
qualifier |
example |
---|---|---|---|---|
frame-selection |
|
ini |
integer |
2 |
complete |
Tell EnTAP to mark all of the transcripts as ‘complete’. This will only be seen in the final output and will not affect the run. |
ini |
bool |
complete |
param |
description |
location (cmd/ini) |
qualifier |
example |
---|---|---|---|---|
transdecoder-m |
Specify the minimum protein length for TransDecoder |
ini |
integer |
100 |
transdecoder-no-refine-starts |
Specify this flag if you would like to pipe the TransDecoder command ‘–no_refine_starts’ when it is executed |
ini |
bool |
false |
transdecoder-long-exe |
Method to execute TransDecoder.LongOrfs. This may be the path to the executable, or simple, ‘TransDecoder.LongOrfs’ if installed globally |
ini |
string |
TransDecoder.LongOrfs |
transdecoder-predict-exe |
Method to execute TransDecoder.Predict. This may be the path to the executable, or simple, ‘TransDecoder.Predict’ if installed globally |
ini |
string |
TransDecoder.Predict |
param |
description |
location (cmd/ini) |
qualifier |
example |
---|---|---|---|---|
genemarkst-exe |
Specify the path to the |
ini |
string |
gmst.pl |
param |
description |
location (cmd/ini) |
qualifier |
example |
---|---|---|---|---|
database / d |
Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against |
cmd |
multi-string |
/path/to/diamond/database.dmnd |
data-type |
|
ini |
integer |
0 |
contam / c |
Specify contaminants to be used during Simlilarity Search best hit selection. Contaminants can be selected by species or through a specific taxon (insecta) from the NCBI Taxonomy Database. If your taxon is more than one word just replace the spaces with underscores (_). Alignments will be flagged as contaminants and will be lower scoring compared to other alignments. |
ini |
multi-string |
insecta |
taxon |
This flag will allow for taxonomic ‘favoring’ of hits that are closer to your target species or lineage. Any lineage can be used as referenced by the NCBI Taxonomic database, such as genus, phylum, or species. Format must replace all spaces with underscores (‘_’) |
ini |
string |
homo_sapiens |
e |
Specify E-value cutoff for Similarity Searching results (in scientific notation format). |
ini |
scientific |
10E-5 |
tcoverage |
Specify minimum target coverage for similarity searching |
ini |
float |
50 |
qcoverage |
Specify minimum query coverage for similarity searching |
ini |
float |
50 |
uninformative |
Comma-deliminated list of terms you would like to be deemed “uninformative”. Any alignments during Similarity Searching tagged as uninformative will be scored lower |
ini |
string |
conserved, predicted, unnamed, hypothetical, putative, unidentified, uncharacterized, unknown, uncultured, uninformative |
diamond-exe |
Specify the execution method for DIAMOND. This can be a path to the |
ini |
string |
diamond |
param |
description |
location (cmd/ini) |
qualifier |
example |
---|---|---|---|---|
ontology |
|
ini |
multi-integer |
0 |
param |
description |
location (cmd/ini) |
qualifier |
example |
---|---|---|---|---|
eggnog-sql |
Path to the EggNOG SQL database that was downloaded during the Configuration stage. |
ini |
string |
/databases/eggnog.db |
eggnog-dmnd |
Path to the EggNOG DIAMOND configured database that was generated during the Configuration stage |
ini |
string |
/databases/eggnog_proteins.dmnd |
param |
description |
location (cmd/ini) |
qualifier |
example |
---|---|---|---|---|
protein |
|
ini |
multi-string |
pfam |
interproscan-exe |
Specify the execution method for InterProScan. Commonly this can be the path to the |
ini |
string |
interproscan.sh |
param |
description |
location (cmd/ini) |
qualifier |
example |
---|---|---|---|---|
api-taxon |
Check whether a species can be found within the specified EnTAP database. Returns a JSON formatted text indicating whether the species was found. Format must replace all spaces with underscores (‘_’) as follows: “- -taxon homo_sapiens” or “- -taxon primates” |
cmd |
string |
homo_sapiens |