Execution Flags

These are the flags for the execution process of EnTAP. Flags will be repeated throughout these categories where relevant. All flags will be used via the command line (denoted cmd) or EnTAP ini file (denoted ini).

Required Flags

param

description

location (cmd/ini)

qualifier

example

runP or runN

Specify blastp or blastx annotation. If - -runP is selected with a nucleotide input, frame selection will be ran and annotation stages will be executed with protein sequences (blastp). If - -runP is selected with a protein input, frame selection will not be ran and annotation will be executed with protein sequences (blastp). If - -runN is selected with nucleotide input, frame selection will not be ran and annotation will be executed with nucleotide sequences (blastx)

cmd

flag

runP

input / i

Path to the transcriptome file (either nucleotide or protein)

cmd

string

/path/to/input/transcriptome.fa

database / i

Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against

cmd

multi-string

/path/to/diamond/database.dmnd

ini

Point to the entap_config.ini to specify paths and commands needed by EnTAP

cmd

string

/path/to/entap_config.ini

Recommended Flags

param

description

location (cmd/ini)

qualifier

example

runP or runN

Specify blastp or blastx annotation. If - -runP is selected with a nucleotide input, frame selection will be ran and annotation stages will be executed with protein sequences (blastp). If - -runP is selected with a protein input, frame selection will not be ran and annotation will be executed with protein sequences (blastp). If - -runN is selected with nucleotide input, frame selection will not be ran and annotation will be executed with nucleotide sequences (blastx)

cmd

flag

runP

input / i

Path to the transcriptome file (either nucleotide or protein)

cmd

string

/path/to/input/transcriptome.fa

database / d

Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against

cmd

multi-string

/path/to/diamond/database.dmnd

ini

Point to the entap_config.ini to specify paths and commands needed by EnTAP

cmd

multi-string

/path/to/entap_config.ini

out-dir

Specify an output directory for all of the files generated by EnTAP

cmd

string

/path/to/entap_output

General EnTAP Flags

param

description

location (cmd/ini)

qualifier

example

runP or runN

Specify blastp or blastx annotation. If - -runP is selected with a nucleotide input, frame selection will be ran and annotation stages will be executed with protein sequences (blastp). If - -runP is selected with a protein input, frame selection will not be ran and annotation will be executed with protein sequences (blastp). If - -runN is selected with nucleotide input, frame selection will not be ran and annotation will be executed with nucleotide sequences (blastx)

cmd

flag

runP

input / i

Path to the transcriptome file (either nucleotide or protein)

cmd

string

/path/to/input/transcriptome.

database / d

Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against

cmd

multi-string

/path/to/diamond/database.dmnd

ini

Point to the entap_config.ini to specify paths and commands needed by EnTAP

cmd

string

/path/to/entap_config.ini

out-dir

Specify an output directory for all of the files generated by EnTAP

cmd

string

/path/to/entap_output

overwrite

All previously ran files will be overwritten. Without this flag, EnTAP will recognize the results from the previous run and skip executing the portions that were already ran

cmd

flag

overwrite

graph

Specifying this will check whether or not your system has graphing functionality supported and then exit. System will need Python installed with the Matplotlib module.

cmd

flag

graph

threads / t

Specify the number of threads for execution

cmd

integer

5

no-trim

By default, EnTAP will trim your sequence headers to the first space to maintain compatbility across different software. Using this flag will instead retain the information of the header by removing all spaces.

  • Example:

    • >TRINITY_231.1 protein12312_43 inform

    • >TRINITY_231.1protein12312_43inform

cmd

flag

no-trim

state

Precise control over Execution of EnTAP. This flag allows for certain parts to be ran while skipping others. More information can be seen in the Execution section.

cmd

string

‘+’

version

Prints the current EnTAP version you are running

cmd

flag

version

no-check

EnTAP checks execution paths and inputs prior to annotating to prevent finding out your input was wrong until midway through a run. Using this flag will eliminate the check (not advised to use!)

cmd

flag

no-check

output-format

Specify multiple output file formats for each stage of the pipeline

    1. TSV File (default)

    1. CSV File

    1. FASTA Protein File (default)

    1. FASTA Nucleotide File (default)

    1. Gene Enrichment Gene ID + Effective Length

    1. Gene Enrichment Gene ID + GO Terms

    1. Gene Ontology Terms (Sequence ID,GO Term ID, GO Term, Category, and Sequence Effective Length) TSV format

ini

multi-integer

1,3,4,7

Expression Analysis Flags

param

description

location (cmd/ini)

qualifier

example

align / a

Path to the alignment file (either SAM or BAM format). Ignoring this flag will skip expression filtering. Be sure to look at the other Expression Analysis flags if using this.

cmd

string

/path/to/alignment.bam

rsem-calculate-expression

Specify the path to the rsem-calculate-expression file generated during installation of RSEM

ini

string

/libs/RSEM-1.3.3/rsem-calculate-expression

rsem-sam-validator

Specify the path to the rsem-sam-validator file generated during installation of RSEM

ini

string

/libs/RSEM-1.3.3/rsem-sam-validator

rsem-prepare-reference

Specify the path to the rsem-prepare-reference file generated during installation of RSEM

ini

string

/libs/RSEM-1.3.3/rsem-prepare-reference

rsem-convert-sam-for-rsem

Specify the path to the rsem-convert-sam-for-rsem file generated during installation of RSEM

ini

string

/libs/RSEM-1.3.3/rsem-convert-sam-for-rsem

fpkm

Specify the FPKM (fragments per kilobase of exon per million mapped fragments) cutoff for Expression Filtering. All transcripts below this number will be filtered out and removed.

ini

float

0.5

single-end

Signify your reads are single-end for RSEM execution instead of paired-end (default)

ini

bool

single-end

Frame Selection Flags

param

description

location (cmd/ini)

qualifier

example

frame-selection

Specify the Frame Selection software to use
    1. GeneMarkS-T

    1. TransDecoder (default)

ini

integer

2

complete

Tell EnTAP to mark all of the transcripts as ‘complete’. This will only be seen in the final output and will not affect the run.

ini

bool

complete

Frame Selection - TransDecoder Specific Flags

param

description

location (cmd/ini)

qualifier

example

transdecoder-m

Specify the minimum protein length for TransDecoder

ini

integer

100

transdecoder-no-refine-starts

Specify this flag if you would like to pipe the TransDecoder command ‘–no_refine_starts’ when it is executed

ini

bool

false

transdecoder-long-exe

Method to execute TransDecoder.LongOrfs. This may be the path to the executable, or simple, ‘TransDecoder.LongOrfs’ if installed globally

ini

string

TransDecoder.LongOrfs

transdecoder-predict-exe

Method to execute TransDecoder.Predict. This may be the path to the executable, or simple, ‘TransDecoder.Predict’ if installed globally

ini

string

TransDecoder.Predict

Frame Selection - GeneMark-ST Specific Flags

param

description

location (cmd/ini)

qualifier

example

genemarkst-exe

Specify the path to the gmst.pl file for running GeneMark-ST

ini

string

gmst.pl

Similarity Search Flags

param

description

location (cmd/ini)

qualifier

example

database / d

Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against

cmd

multi-string

/path/to/diamond/database.dmnd

data-type

Specify which EnTAP database you’d like to use for execution
    1. Binary Database (default) - This will be much quicker and is recommended

    1. SQL Database - Slower although will be more easily compatible with every system

ini

integer

0

contam / c

Specify contaminants to be used during Simlilarity Search best hit selection. Contaminants can be selected by species or through a specific taxon (insecta) from the NCBI Taxonomy Database. If your taxon is more than one word just replace the spaces with underscores (_). Alignments will be flagged as contaminants and will be lower scoring compared to other alignments.

ini

multi-string

insecta

taxon

This flag will allow for taxonomic ‘favoring’ of hits that are closer to your target species or lineage. Any lineage can be used as referenced by the NCBI Taxonomic database, such as genus, phylum, or species. Format must replace all spaces with underscores (‘_’)

ini

string

homo_sapiens

e

Specify E-value cutoff for Similarity Searching results (in scientific notation format).

ini

scientific

10E-5

tcoverage

Specify minimum target coverage for similarity searching

ini

float

50

qcoverage

Specify minimum query coverage for similarity searching

ini

float

50

uninformative

Comma-deliminated list of terms you would like to be deemed “uninformative”. Any alignments during Similarity Searching tagged as uninformative will be scored lower

ini

string

conserved, predicted, unnamed, hypothetical, putative, unidentified, uncharacterized, unknown, uncultured, uninformative

diamond-exe

Specify the execution method for DIAMOND. This can be a path to the diamond file generated during installation, or simply the command if installed globally

ini

string

diamond

Ontology Flags

param

description

location (cmd/ini)

qualifier

example

ontology

Specify which ontology packages you would like to use. Multiple flags may be used to specify execution of multiple software packages.
  • 0 - EggNOG (default)

  • 1 - InterProScan

ini

multi-integer

0

Ontology - EggNOG Specific Flags

param

description

location (cmd/ini)

qualifier

example

eggnog-sql

Path to the EggNOG SQL database that was downloaded during the Configuration stage.

ini

string

/databases/eggnog.db

eggnog-dmnd

Path to the EggNOG DIAMOND configured database that was generated during the Configuration stage

ini

string

/databases/eggnog_proteins.dmnd

Ontology - InterProScan Specific Flags

param

description

location (cmd/ini)

qualifier

example

protein

User this option if you would like to run InterProScan against specific databases. Multiple databases can be selected.
  • tigrfam

  • sfld

  • prodom

  • hamap

  • pfam

  • smart

  • cdd

  • prositeprofiles

  • prositepatterns

  • superfamily

  • prints

  • panther

  • gene3d

  • pirsf

  • coils

  • mobidblite

ini

multi-string

pfam

interproscan-exe

Specify the execution method for InterProScan. Commonly this can be the path to the interproscan.sh file

ini

string

interproscan.sh

EnTAP API Flags

param

description

location (cmd/ini)

qualifier

example

api-taxon

Check whether a species can be found within the specified EnTAP database. Returns a JSON formatted text indicating whether the species was found. Format must replace all spaces with underscores (‘_’) as follows: “- -taxon homo_sapiens” or “- -taxon primates”

cmd

string

homo_sapiens