Execution Flags ============================ These are the flags for the execution process of EnTAP. These will be used via the command line (denoted CMD), |run_ini_file_format| file (denoted R-ini), or |config_file_format| file (denoted E-ini). Since there are required and recommended flags, these will be repeated throughout the other categories where relevant. All commands may be used through the command line in addition to the recommended usage. There are a few data types (qualifiers) to keep in mind used throughout these ini files. Anything that is specifies as a 'multi' type means that the parameter may be entered multiple times. If it is in an ini file, each parameters must be separated by a comma (','). Example for multi-integer:"1,2,3" (entered without quotes) .. list-table:: **Required Flags** :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - run - Execute the main EnTAP annotation functionality including, but not limited to, Similarity Searching and Gene Family analysis. Optional Frame Selection, Expression Analysis, and HGT analysis can be added with the appropriate flags. - cmd - flag - run * - input / i - Path to the transcriptome file (either nucleotide or protein) - R-ini - string - /path/to/input/transcriptome.fa * - database / i - Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against - R-ini - multi-string - /path/to/diamond/database.dmnd * - run-ini - Point to the |run_ini_file_format| to specify run-specific parameters and paths. - cmd - string - /path/to/|run_ini_file| * - entap-ini - Point to the |config_file_format| to specify database paths and software execution paths. - cmd - string - /path/to/|config_file| * - entap-db-bin - Path to the |entap_db_bin_file_format| database. Either this or the EnTAP SQL database must be used. The binary database is the default and is recommended - E-ini - string - /path/to/|entap_db_bin_file| * - diamond-exe - Specify the execution method for DIAMOND. This can be a path to the :file:`diamond` file generated during installation, or simply the command if installed globally - E-ini - string - diamond * - eggnog-map-data - Path to the directory containing the EggNOG SQL database |eggnog_map_sql_db_file_format| that was downloaded during the Configuration stage. EnTAP will check for the eggnog.db database within this specified directory - E-ini - string - /path/to/eggnog_db_directory * - eggnog-map-dmnd - Path to the EggNOG DIAMOND configured database |eggnog_map_dmnd_db_file_format| that was generated during the Configuration stage. - E-ini - string - /databases/eggnog_proteins.dmnd * - eggnog-map-exe - Path to the EggNOG-mapper executable, or method of execution. If installed globally, this is simply |emapper_exe_format| - E-ini - string - emapper.py .. list-table:: **Recommended Flags** :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - run - Execute the main EnTAP annotation functionality including, but not limited to, Similarity Searching and Gene Family analysis. Optional Frame Selection, Expression Analysis, and HGT analysis can be added with the appropriate flags. - cmd - flag - run * - input / i - Path to the transcriptome file (either nucleotide or protein) - R-ini - string - /path/to/input/transcriptome.fa * - database / d - Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against - R-ini - multi-string - /path/to/diamond/database.dmnd * - run-ini - Point to the |run_ini_file_format| to specify run-specific parameters and paths. - cmd - string - /path/to/|run_ini_file| * - entap-ini - Point to the |config_file_format| to specify database paths and software execution paths. - cmd - string - /path/to/|config_file| * - out-dir - Specify an output directory for all of the files generated by EnTAP - R-ini - string - /path/to/entap_output * - frame_selection - Specify if you would like to perform frame selection/gene prediction on your input nucleotide transcriptome. This flag will be ignored if you input protein sequences. If this is set to false with nucleotide input, EnTAP will perform blastx functionality where appropriate. - R-ini - bool - true * - entap-db-bin - Path to the |entap_db_bin_file_format| database. Either this or the EnTAP SQL database must be used. The binary database is the default and is recommended - E-ini - string - /path/to/|entap_db_bin_file| * - entap-graph - Path to the |graph_file_format| EnTAP graphing file. If this is not specified, EnTAP graphics will not be generated - E-ini - string - /path/to/|graph_file| * - diamond-exe - Specify the execution method for DIAMOND. This can be a path to the :file:`diamond` file generated during installation, or simply the command if installed globally - E-ini - string - diamond * - eggnog-map-data - Path to the directory containing the EggNOG SQL database |eggnog_map_sql_db_file_format| that was downloaded during the Configuration stage. EnTAP will check for the eggnog.db database within this specified directory - E-ini - string - /path/to/eggnog_db_directory * - eggnog-map-dmnd - Path to the EggNOG DIAMOND configured database |eggnog_map_dmnd_db_file_format| that was generated during the Configuration stage. - E-ini - string - /databases/eggnog_proteins.dmnd * - eggnog-map-exe - Path to the EggNOG-mapper executable, or method of execution. If installed globally, this is simply |emapper_exe_format| - E-ini - string - emapper.py .. list-table:: **General EnTAP Flags** :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - run - Execute the main EnTAP annotation functionality including, but not limited to, Similarity Searching and Gene Family analysis. Optional Frame Selection, Expression Analysis, and HGT analysis can be added with the appropriate flags. - cmd - flag - run * - input / i - Path to the transcriptome file (either nucleotide or protein) - R-ini - string - /path/to/input/transcriptome. * - database / d - Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against - R-ini - multi-string - /path/to/diamond/database.dmnd * - run-ini - Point to the |run_ini_file_format| to specify run-specific parameters and paths. - cmd - string - /path/to/|run_ini_file| * - entap-ini - Point to the |config_file_format| to specify database paths and software execution paths. - cmd - string - /path/to/|config_file| * - out-dir - Specify an output directory for all of the files generated by EnTAP - R-ini - string - /path/to/entap_output * - overwrite - All previously ran files will be overwritten. Without this flag, EnTAP will recognize the results from the previous run and skip executing the portions that were already ran - R-ini - bool - true * - resume - Set this flag to TRUE to if you would like EnTAP to continue execution if files from a previous run are found. If set to FALSE, EnTAP will stop execution after files from a previous run are detected. Note, 'overwrite' flag supersedes this - for example if 'overwrite' is set to TRUE and 'resume' set to FALSE with files from a previous run detected, they will be deleted and execution will continue. - R-ini - bool - false * - graph - Specifying this will check whether or not your system has graphing functionality supported and then exit. System will need Python installed with the Matplotlib module. - cmd - bool - true * - threads / t - Specify the number of threads for execution - R-ini - integer - 5 * - no-trim - By default, EnTAP will trim your sequence headers to the first space to maintain compatbility across different software. Using this flag will instead retain the information of the header by removing all spaces. * Example: * >TRINITY_231.1 protein12312_43 inform * >TRINITY_231.1protein12312_43inform - R-ini - bool - true * - state - Precise control over Execution of EnTAP. This flag allows for certain parts to be ran while skipping others. More information can be seen in the Execution section. - R-ini - string - '+' * - version - Prints the current EnTAP version you are running - cmd - flag - version * - no-check - EnTAP checks execution paths and inputs prior to annotating to prevent finding out your input was wrong until midway through a run. Using this flag will eliminate the check (not advised to use!) - R-ini - bool - true * - output-format - Specify multiple output file formats for each stage of the pipeline * 1. TSV File (default) * 2. CSV File * 3. FASTA Protein File (default) * 4. FASTA Nucleotide File (default) * 5. Gene Enrichment Gene ID + Effective Length * 6. Gene Enrichment Gene ID + GO Terms * 7. Gene Ontology Terms (Sequence ID,GO Term ID, GO Term, Category, and Sequence Effective Length) TSV format (default) - R-ini - multi-integer - 1,3,4,7 * - entap-db-sql - Path to the |entap_db_sql_file_format| database. Either this or the EnTAP binary database must be used - E-ini - string - /path/to/|entap_db_sql_file| * - entap-db-bin - Path to the |entap_db_bin_file_format| database. Either this or the EnTAP SQL database must be used. The binary database is the default and is recommended - E-ini - string - /path/to/|entap_db_bin_file| * - entap-graph - Path to the |graph_file_format| EnTAP graphing file. If this is not specified, EnTAP graphics will not be generated - E-ini - string - /path/to/|graph_file| .. list-table:: **Expression Analysis Flags** :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - align / a - Path to the alignment file (either SAM or BAM format). Ignoring this flag will skip expression filtering. Be sure to look at the other Expression Analysis flags if using this. - R-ini - string - /path/to/alignment.bam * - rsem-calculate-expression - Specify the path to the :file:`rsem-calculate-expression` file generated during installation of RSEM - E-ini - string - /libs/RSEM-1.3.3/rsem-calculate-expression * - rsem-sam-validator - Specify the path to the :file:`rsem-sam-validator` file generated during installation of RSEM - E-ini - string - /libs/RSEM-1.3.3/rsem-sam-validator * - rsem-prepare-reference - Specify the path to the :file:`rsem-prepare-reference` file generated during installation of RSEM - E-ini - string - /libs/RSEM-1.3.3/rsem-prepare-reference * - rsem-convert-sam-for-rsem - Specify the path to the :file:`rsem-convert-sam-for-rsem` file generated during installation of RSEM - E-ini - string - /libs/RSEM-1.3.3/rsem-convert-sam-for-rsem * - fpkm - Specify the FPKM (fragments per kilobase of exon per million mapped fragments) cutoff for Expression Filtering. All transcripts below this number will be filtered out and removed. - R-ini - float - 0.5 * - single-end - Signify your reads are single-end for RSEM execution instead of paired-end (default) - R-ini - bool - true .. list-table:: Frame Selection Flags :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - frame_selection - Specify if you would like to perform frame selection/gene prediction on your input nucleotide transcriptome. This flag will be ignored if you input protein sequences. If this is set to false with nucleotide input, EnTAP will perform blastx functionality where appropriate. - R-ini - bool - true .. list-table:: Frame Selection - TransDecoder Specific Flags :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - transdecoder-m - Specify the minimum protein length for TransDecoder - R-ini - integer - 100 * - transdecoder-no-refine-starts - Specify this flag if you would like to pipe the TransDecoder command '--no_refine_starts' when it is executed - R-ini - bool - false * - transdecoder-long-exe - Method to execute TransDecoder.LongOrfs. This may be the path to the executable, or simply, 'TransDecoder.LongOrfs' if installed globally - E-ini - string - TransDecoder.LongOrfs * - transdecoder-predict-exe - Method to execute TransDecoder.Predict. This may be the path to the executable, or simply, 'TransDecoder.Predict' if installed globally - E-ini - string - TransDecoder.Predict .. list-table:: **Similarity Search Flags** :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - database / d - Specify up to 5 DIAMOND indexed (.dmnd) databases to run similarity search against - R-ini - multi-string - /path/to/diamond/database.dmnd * - data-type - Specify which EnTAP database you'd like to use for execution * 0. Binary Database (default) - This will be much quicker and is recommended * 1. SQL Database - Slower although will be more easily compatible with every system - R-ini - integer - 0 * - contam / c - Specify contaminants to be used during Simlilarity Search best hit selection. Contaminants can be selected by species or through a specific taxon (insecta) from the NCBI Taxonomy Database. If your taxon is more than one word just replace the spaces with underscores (_). Alignments will be flagged as contaminants and will be lower scoring compared to other alignments. - R-ini - multi-string - insecta * - taxon - This flag will allow for 'taxonomic favoring' of hits that are closer to your target species or lineage. Any lineage can be used as referenced by the NCBI Taxonomic database, such as genus, phylum, or species. Format **must** replace all spaces with underscores ('_') - R-ini - string - homo_sapiens * - e - Specify E-value cutoff for Similarity Searching results (in scientific notation format). - R-ini - scientific - 10E-5 * - tcoverage - Specify minimum target coverage for similarity searching - R-ini - float - 50 * - qcoverage - Specify minimum query coverage for similarity searching - R-ini - float - 50 * - uninformative - Comma-deliminated list of terms you would like to be deemed "uninformative". Any alignments during Similarity Searching tagged as uninformative will be scored lower - R-ini - string - conserved, predicted, unnamed, hypothetical, putative, unidentified, uncharacterized, unknown, uncultured, uninformative * - diamond-exe - Specify the execution method for DIAMOND. This can be a path to the :file:`diamond` file generated during installation, or simply the command if installed globally - E-ini - string - diamond .. list-table:: Similarity Search - DIAMOND Specific Flags :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - diamond-sensitivity - Specify the DIAMOND sensitivity used against input DIAMOND databases (Similarity Searching and HGT Analysis). Sensitivities are based off of DIAMOND documentation with a higher sensitivity generally taking longer but giving a higher alignment rate. Sensitivity options are fast, mid-sensitive, sensitive, more-sensitive, very-sensitive, ultra-sensitive. - R-ini - string - very-sensitive .. list-table:: **Ontology Flags** :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - ontology_source - Specify which ontology source packages you would like to use. Multiple flags may be used to specify execution of multiple software packages. * 0 - EggNOG (default) * 1 - InterProScan - R-ini - multi-integer - 0 .. list-table:: Ontology - EggNOG Specific Flags :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - eggnog-map-data - Path to the directory containing the EggNOG SQL database |eggnog_map_sql_db_file_format| that was downloaded during the Configuration stage. EnTAP will check for the eggnog.db database within this specified directory - E-ini - string - /path/to/eggnog_db_directory * - eggnog-map-dmnd - Path to the EggNOG DIAMOND configured database |eggnog_map_dmnd_db_file_format| that was generated during the Configuration stage. - E-ini - string - /databases/eggnog_proteins.dmnd * - eggnog-map-exe - Path to the EggNOG-mapper executable, or method of execution. If installed globally, this is simply |emapper_exe_format| - E-ini - string - emapper.py * - eggnog-contaminant - Specify this to turn on/off EggNOG contaminant analysis (on by default). This leverages the taxon input from the contaminant Similarity Search command to determine if an EggNOG annotation should be flagged as a contaminant. EggNOG contaminant analysis can only be performed alongside Similarity Search contaminant analysis (not on its own) and will only be utilized if no alignments were found for a given transcript during Similarity Searching - R-ini - bool - true * - eggnog-dbmem - Specify this to use the '--dbmem' flag with EggNOG-mapper. This will load the entire eggnog.db sqlite3 database into memory which can require up to ~44GB of memory. However, this will significantly speed up EggNOG annotations - R-ini - bool - true * - eggnog-sensitivity - Specify the DIAMOND sensitivity used during EggNOG mapper execution against the EggNOG database. Sensitivities are based off of DIAMOND documentation with a higher sensitivity generally taking longer but giving a higher alignment rate. Sensitivity options are fast, mid-sensitive, sensitive, more-sensitive, very-sensitive, ultra-sensitive. - R-ini - string - more-sensitive .. list-table:: Ontology - InterProScan Specific Flags :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - interproscan-db - User this option if you would like to run InterProScan against specific databases. Multiple databases can be selected. * tigrfam * sfld * prodom * hamap * pfam * smart * cdd * prositeprofiles * prositepatterns * superfamily * prints * panther * gene3d * pirsf * coils * mobidblite - R-ini - multi-string - pfam * - interproscan-exe - Specify the execution method for InterProScan. Commonly this can be the path to the :file:`interproscan.sh` file - E-ini - string - interproscan.sh .. list-table:: **Horizontal Gene Transfer Flags** :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - hgt-donor - Specify the DIAMOND configured (.dmnd extension) donor databases for Horizontal Gene Transfer analysis. Separate databases with a comma (',') - R-ini - multi-string - path/to/donor/database1.dmnd,path/to/donor/database2.dmnd * - hgt-recipient - Specify the DIAMOND configured (.dmnd extension) recipient databases for Horizontal Gene Transfer analysis. Separate databases with a comma (',') - R-ini - multi-string - path/to/recipient/database1.dmnd,path/to/recipient/database2.dmnd * - hgt-gff - Specify path to the GFF file for HGT analysis. The input GFF must satisfy the following: * Protein identifiers must match between FASTA and GFF attribute fields * Primary transcripts only (longest isoform for each gene) * Feature type = 'transcript' or 'mRNA' * Must be in relative order. This can be accomplished if it is ran through software such as agat_sp_keep_longest_isoform (https://agat.readthedocs.io/en/latest/tools/agat_sp_keep_longest_isoform.html) - R-ini - string - path/to/gff/file.gff * - diamond-exe - Specify the execution method for DIAMOND. This can be a path to the :file:`diamond` file generated during installation, or simply the command if installed globally. DIAMOND is leverage for HGT analysis - E-ini - string - diamond .. list-table:: **EnTAP API Flags** :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - api-taxon - Check whether a species can be found within the specified EnTAP database. Returns a JSON formatted text indicating whether the species was found. Format **must** replace all spaces with underscores ('_') as follows: "- -taxon homo_sapiens" or "- -taxon primates" - cmd - string - homo_sapiens .. list-table:: **NCBI API Flags** :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - ncbi-api-key - Enter your personal NCBI API key, if available. This can be assigned to you through your NCBI account. Although not required, enabling EnTAP to use your API key will allow for much quicker accessions to the NCBI database. Your API key will only be used to access the NCBI database and only stored locally. If you are a contributor to EnTAP, DO NOT accidentally commit your API key to git! - R-ini - string - n/a * - ncbi-api-enable - Allow EnTAP to access the NCBI database API to pull additional information for your data during Similarity Searching if searching against a NCBI database. - R-ini - bool - true