Execution Flags

These are the flags for the execution process of EnTAP. These will be used via the command line (denoted CMD), entap_run.params file (denoted R-ini), or entap_config.ini file (denoted E-ini). Since there are required and recommended flags, these will be repeated throughout the other categories where relevant.

There are a few data types (qualifiers) to keep in mind used throughout these ini files. Anything that is specifies as a ‘multi’ type means that the parameter may be entered multiple times. If it is in an ini file, each parameters must be separated by a comma (‘,’). Example for multi-integer:”1,2,3” (entered without quotes)

Required Flags

param

description

location (cmd/R-ini,E-ini)

qualifier

example

runP or runN

Specify blastp or blastx annotation. If - -runP is selected with a nucleotide input, frame selection will be ran and annotation stages will be executed with protein sequences (blastp). If - -runP is selected with a protein input, frame selection will not be ran and annotation will be executed with protein sequences (blastp). If - -runN is selected with nucleotide input, frame selection will not be ran and annotation will be executed with nucleotide sequences (blastx)

cmd

flag

runP

input / i

Path to the transcriptome file (either nucleotide or protein)

R-ini

string

/path/to/input/transcriptome.fa

database / i

Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against

R-ini

multi-string

/path/to/diamond/database.dmnd

run-ini

Point to the entap_run.params to specify run-specific parameters and paths.

cmd

string

/path/to/entap_run.params

entap-ini

Point to the entap_config.ini to specify database paths and software execution paths.

cmd

string

/path/to/entap_config.ini

entap-db-bin

Path to the entap_database.bin database. Either this or the EnTAP SQL database must be used. The binary database is the default and is recommended

E-ini

string

/path/to/entap_database.bin

diamond-exe

Specify the execution method for DIAMOND. This can be a path to the diamond file generated during installation, or simply the command if installed globally

E-ini

string

diamond

eggnog-map-data

Path to the directory containing the EggNOG SQL database eggnog.db that was downloaded during the Configuration stage. EnTAP will check for the eggnog.db database within this specified directory

E-ini

string

/path/to/eggnog_db_directory

eggnog-map-dmnd

Path to the EggNOG DIAMOND configured database eggnog_proteins.dmnd that was generated during the Configuration stage.

E-ini

string

/databases/eggnog_proteins.dmnd

eggnog-map-exe

Path to the EggNOG-mapper executable, or method of execution. If installed globally, this is simply emapper.py

E-ini

string

emapper.py

Recommended Flags

param

description

location (cmd/R-ini,E-ini)

qualifier

example

runP or runN

Specify blastp or blastx annotation. If - -runP is selected with a nucleotide input, frame selection will be ran and annotation stages will be executed with protein sequences (blastp). If - -runP is selected with a protein input, frame selection will not be ran and annotation will be executed with protein sequences (blastp). If - -runN is selected with nucleotide input, frame selection will not be ran and annotation will be executed with nucleotide sequences (blastx)

cmd

flag

runP

input / i

Path to the transcriptome file (either nucleotide or protein)

R-ini

string

/path/to/input/transcriptome.fa

database / d

Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against

R-ini

multi-string

/path/to/diamond/database.dmnd

run-ini

Point to the entap_run.params to specify run-specific parameters and paths.

cmd

string

/path/to/entap_run.params

entap-ini

Point to the entap_config.ini to specify database paths and software execution paths.

cmd

string

/path/to/entap_config.ini

out-dir

Specify an output directory for all of the files generated by EnTAP

R-ini

string

/path/to/entap_output

entap-db-bin

Path to the entap_database.bin database. Either this or the EnTAP SQL database must be used. The binary database is the default and is recommended

E-ini

string

/path/to/entap_database.bin

entap-graph

Path to the entap_graphing.py EnTAP graphing file. If this is not specified, EnTAP graphics will not be generated

E-ini

string

/path/to/entap_graphing.py

diamond-exe

Specify the execution method for DIAMOND. This can be a path to the diamond file generated during installation, or simply the command if installed globally

E-ini

string

diamond

eggnog-map-data

Path to the directory containing the EggNOG SQL database eggnog.db that was downloaded during the Configuration stage. EnTAP will check for the eggnog.db database within this specified directory

E-ini

string

/path/to/eggnog_db_directory

eggnog-map-dmnd

Path to the EggNOG DIAMOND configured database eggnog_proteins.dmnd that was generated during the Configuration stage.

E-ini

string

/databases/eggnog_proteins.dmnd

eggnog-map-exe

Path to the EggNOG-mapper executable, or method of execution. If installed globally, this is simply emapper.py

E-ini

string

emapper.py

General EnTAP Flags

param

description

location (cmd/R-ini,E-ini)

qualifier

example

runP or runN

Specify blastp or blastx annotation. If - -runP is selected with a nucleotide input, frame selection will be ran and annotation stages will be executed with protein sequences (blastp). If - -runP is selected with a protein input, frame selection will not be ran and annotation will be executed with protein sequences (blastp). If - -runN is selected with nucleotide input, frame selection will not be ran and annotation will be executed with nucleotide sequences (blastx)

cmd

flag

runP

input / i

Path to the transcriptome file (either nucleotide or protein)

R-ini

string

/path/to/input/transcriptome.

database / d

Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against

R-ini

multi-string

/path/to/diamond/database.dmnd

run-ini

Point to the entap_run.params to specify run-specific parameters and paths.

cmd

string

/path/to/entap_run.params

entap-ini

Point to the entap_config.ini to specify database paths and software execution paths.

cmd

string

/path/to/entap_config.ini

out-dir

Specify an output directory for all of the files generated by EnTAP

R-ini

string

/path/to/entap_output

overwrite

All previously ran files will be overwritten. Without this flag, EnTAP will recognize the results from the previous run and skip executing the portions that were already ran

R-ini

bool

true

graph

Specifying this will check whether or not your system has graphing functionality supported and then exit. System will need Python installed with the Matplotlib module.

cmd

bool

true

threads / t

Specify the number of threads for execution

R-ini

integer

5

no-trim

By default, EnTAP will trim your sequence headers to the first space to maintain compatbility across different software. Using this flag will instead retain the information of the header by removing all spaces.

  • Example:

    • >TRINITY_231.1 protein12312_43 inform

    • >TRINITY_231.1protein12312_43inform

R-ini

bool

true

state

Precise control over Execution of EnTAP. This flag allows for certain parts to be ran while skipping others. More information can be seen in the Execution section.

R-ini

string

‘+’

version

Prints the current EnTAP version you are running

cmd

flag

version

no-check

EnTAP checks execution paths and inputs prior to annotating to prevent finding out your input was wrong until midway through a run. Using this flag will eliminate the check (not advised to use!)

R-ini

bool

true

output-format

Specify multiple output file formats for each stage of the pipeline

    1. TSV File (default)

    1. CSV File

    1. FASTA Protein File (default)

    1. FASTA Nucleotide File (default)

    1. Gene Enrichment Gene ID + Effective Length

    1. Gene Enrichment Gene ID + GO Terms

    1. Gene Ontology Terms (Sequence ID,GO Term ID, GO Term, Category, and Sequence Effective Length) TSV format (default)

R-ini

multi-integer

1,3,4,7

entap-db-sql

Path to the entap_database.db database. Either this or the EnTAP binary database must be used

E-ini

string

/path/to/entap_database.db

entap-db-bin

Path to the entap_database.bin database. Either this or the EnTAP SQL database must be used. The binary database is the default and is recommended

E-ini

string

/path/to/entap_database.bin

entap-graph

Path to the entap_graphing.py EnTAP graphing file. If this is not specified, EnTAP graphics will not be generated

E-ini

string

/path/to/entap_graphing.py

Expression Analysis Flags

param

description

location (cmd/R-ini,E-ini)

qualifier

example

align / a

Path to the alignment file (either SAM or BAM format). Ignoring this flag will skip expression filtering. Be sure to look at the other Expression Analysis flags if using this.

R-ini

string

/path/to/alignment.bam

rsem-calculate-expression

Specify the path to the rsem-calculate-expression file generated during installation of RSEM

E-ini

string

/libs/RSEM-1.3.3/rsem-calculate-expression

rsem-sam-validator

Specify the path to the rsem-sam-validator file generated during installation of RSEM

E-ini

string

/libs/RSEM-1.3.3/rsem-sam-validator

rsem-prepare-reference

Specify the path to the rsem-prepare-reference file generated during installation of RSEM

E-ini

string

/libs/RSEM-1.3.3/rsem-prepare-reference

rsem-convert-sam-for-rsem

Specify the path to the rsem-convert-sam-for-rsem file generated during installation of RSEM

E-ini

string

/libs/RSEM-1.3.3/rsem-convert-sam-for-rsem

fpkm

Specify the FPKM (fragments per kilobase of exon per million mapped fragments) cutoff for Expression Filtering. All transcripts below this number will be filtered out and removed.

R-ini

float

0.5

single-end

Signify your reads are single-end for RSEM execution instead of paired-end (default)

R-ini

bool

true

Frame Selection - TransDecoder Specific Flags

param

description

location (cmd/R-ini,E-ini)

qualifier

example

transdecoder-m

Specify the minimum protein length for TransDecoder

R-ini

integer

100

transdecoder-no-refine-starts

Specify this flag if you would like to pipe the TransDecoder command ‘–no_refine_starts’ when it is executed

R-ini

bool

false

transdecoder-long-exe

Method to execute TransDecoder.LongOrfs. This may be the path to the executable, or simply, ‘TransDecoder.LongOrfs’ if installed globally

E-ini

string

TransDecoder.LongOrfs

transdecoder-predict-exe

Method to execute TransDecoder.Predict. This may be the path to the executable, or simply, ‘TransDecoder.Predict’ if installed globally

E-ini

string

TransDecoder.Predict

Similarity Search Flags

param

description

location (cmd/R-ini,E-ini)

qualifier

example

database / d

Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against

R-ini

multi-string

/path/to/diamond/database.dmnd

data-type

Specify which EnTAP database you’d like to use for execution
    1. Binary Database (default) - This will be much quicker and is recommended

    1. SQL Database - Slower although will be more easily compatible with every system

R-ini

integer

0

contam / c

Specify contaminants to be used during Simlilarity Search best hit selection. Contaminants can be selected by species or through a specific taxon (insecta) from the NCBI Taxonomy Database. If your taxon is more than one word just replace the spaces with underscores (_). Alignments will be flagged as contaminants and will be lower scoring compared to other alignments.

R-ini

multi-string

insecta

taxon

This flag will allow for ‘taxonomic favoring’ of hits that are closer to your target species or lineage. Any lineage can be used as referenced by the NCBI Taxonomic database, such as genus, phylum, or species. Format must replace all spaces with underscores (‘_’)

R-ini

string

homo_sapiens

e

Specify E-value cutoff for Similarity Searching results (in scientific notation format).

R-ini

scientific

10E-5

tcoverage

Specify minimum target coverage for similarity searching

R-ini

float

50

qcoverage

Specify minimum query coverage for similarity searching

R-ini

float

50

uninformative

Comma-deliminated list of terms you would like to be deemed “uninformative”. Any alignments during Similarity Searching tagged as uninformative will be scored lower

R-ini

string

conserved, predicted, unnamed, hypothetical, putative, unidentified, uncharacterized, unknown, uncultured, uninformative

diamond-exe

Specify the execution method for DIAMOND. This can be a path to the diamond file generated during installation, or simply the command if installed globally

E-ini

string

diamond

Ontology Flags

param

description

location (cmd/R-ini,E-ini)

qualifier

example

ontology_source

Specify which ontology source packages you would like to use. Multiple flags may be used to specify execution of multiple software packages.
  • 0 - EggNOG (default)

  • 1 - InterProScan

R-ini

multi-integer

0

Ontology - EggNOG Specific Flags

param

description

location (cmd/R-ini,E-ini)

qualifier

example

eggnog-map-data

Path to the directory containing the EggNOG SQL database eggnog.db that was downloaded during the Configuration stage. EnTAP will check for the eggnog.db database within this specified directory

E-ini

string

/path/to/eggnog_db_directory

eggnog-map-dmnd

Path to the EggNOG DIAMOND configured database eggnog_proteins.dmnd that was generated during the Configuration stage.

E-ini

string

/databases/eggnog_proteins.dmnd

eggnog-map-exe

Path to the EggNOG-mapper executable, or method of execution. If installed globally, this is simply emapper.py

E-ini

string

emapper.py

Ontology - InterProScan Specific Flags

param

description

location (cmd/R-ini,E-ini)

qualifier

example

interproscan-db

User this option if you would like to run InterProScan against specific databases. Multiple databases can be selected.
  • tigrfam

  • sfld

  • prodom

  • hamap

  • pfam

  • smart

  • cdd

  • prositeprofiles

  • prositepatterns

  • superfamily

  • prints

  • panther

  • gene3d

  • pirsf

  • coils

  • mobidblite

R-ini

multi-string

pfam

interproscan-exe

Specify the execution method for InterProScan. Commonly this can be the path to the interproscan.sh file

E-ini

string

interproscan.sh

Horizontal Gene Transfer Flags

param

description

location (cmd/R-ini,E-ini)

qualifier

example

hgt-donor

Specify the DIAMOND configured (.dmnd extension) donor databases for Horizontal Gene Transfer analysis. Separate databases with a comma (‘,’)

R-ini

multi-string

path/to/donor/database1.dmnd,path/to/donor/database2.dmnd

hgt-recipient

Specify the DIAMOND configured (.dmnd extension) recipient databases for Horizontal Gene Transfer analysis. Separate databases with a comma (‘,’)

R-ini

multi-string

path/to/recipient/database1.dmnd,path/to/recipient/database2.dmnd

hgt-gff

Specify path to the GFF file for HGT analysis. The input GFF must satisfy the following:

R-ini

string

path/to/gff/file.gff

diamond-exe

Specify the execution method for DIAMOND. This can be a path to the diamond file generated during installation, or simply the command if installed globally. DIAMOND is leverage for HGT analysis

E-ini

string

diamond

EnTAP API Flags

param

description

location (cmd/R-ini,E-ini)

qualifier

example

api-taxon

Check whether a species can be found within the specified EnTAP database. Returns a JSON formatted text indicating whether the species was found. Format must replace all spaces with underscores (‘_’) as follows: “- -taxon homo_sapiens” or “- -taxon primates”

cmd

string

homo_sapiens