Changelog ================== This page contains (mostly) all of the changes that were made between each version of EnTAP. EnTAP v2.1.0 (January 11, 2025) ----------------------------------------- * Replaced EnTAP final results TSV columns "Database_UniProt_Database_Cross_Reference" and "Database_UniProt_Additional_Information" with "Database_UniProt_OrthoDB", "Database_UniProt_InterProScan", and "Database_UniProt_Protein_Domains". External parsers may need to be updated after this change * Added column "SeqSearch_NCBI_GeneID" that will appear when similarity searching against an NCBI database. Gene ID information will be pulled and added to this column for relevant alignments * Added additional options for the --database flag. Users can now specify common databases such as 'refseq_plant' or 'uniprot_sprot' and EnTAP will download and configure the database automatically without the user having to do the work * Updated EnTAP database to include PFAM mappings of terms -> PFAM ID. EnTAP will now pull the PFAM ID for PFAM terms pulled from the EggNOG database. WARNING, your EnTAP database will need to be updated due to this change * Removed 'runP' and 'runN' flags, replacing them with 'run' and 'frame_selection' for clarity. 'run' will always be used to execute the main EnTAP annotation pipeline (as opposed to 'config'), while 'frame_selection' can be set to true/false if you would like to perform frame selection on nucleotide input sequences * Changed functionality so EnTAP will not exit if EggNOG contaminant analysis is TRUE with no contaminants input. EnTAP will just skip EggNOG contaminant analysis EnTAP v2.0.0 (October 24, 2024) ----------------------------------------- * Added 'resume' flag. When this flag is used (set to TRUE), EnTAP will continue execution after files from a previous run are identified. If set to FALSE, EnTAP will stop execution and warn the user if previous files are found. Note, TRUE was default behavior previously and will need to be updated now to get the same functionality! * Updated most of the headers used in the summary EnTAP tsv output file. External parsers may need to be updated after this change * Added additional contaminant information to the log file. Contaminants are now broken down by the user input * Fixed an issue with Horizontal Gene Transfer analysis not executing properly * EnTAP has moved to GitLab! All future releases will be at: https://gitlab.com/PlantGenomicsLab/EnTAP * Dockerfile updated to point to GitLab repo EnTAP v1.4.0 (September 11, 2024) ----------------------------------------- * Added user control of DIAMOND sensitivity during HGT Analysis, Similarity Searching, and EggNOG analysis. Higher sensitivity will generally take longer, but result in higher annotation rates * Fixed an issue where InterProScan databases may not be recognized in the user input * Fixed an issue where EggNOG contaminant analysis may still be on even if 'FALSE' was selected in the ini file EnTAP v1.3.0 (July 30, 2024) ----------------------------------------- * Added EggNOG contaminant analysis. This is turned on/off through the 'eggnog-contaminant' flag and only utilized alongside Similarity Search contaminant analysis when an alignment from Similarity Search is not found. More information found in docs * Added 'eggnog-dbmem' flag which will utilize the 'dbmem' flag with EggNOG-mapper. This flag will speed up EggNOG annotations significantly, but will take upwards of 44 GB of usable memory. This is on by default, if you are experiencing memory issues this can be turned off * Changed 'NA' in empty cells of the final output to 'NaN' * Fixed an issue with the EggNOG Seed E-Value not printing correctly in scientific notation in the final 'entap_results' file (was printing as 0). Correct value can be seen in the EggNOG specific output in the 'gene_family/EggNOG' directory * Fixed an issue with the RSEM TPM and Effective Length not printing correctly in the final 'entap_results' file (was printing as 0). Correct value can be seen in the RSEM specific output in the 'expression_analysis/RSEM' directory EnTAP v1.2.1 (June 20, 2024) ----------------------------------------- * Fixed path to TransDecoder executable in entap_config.ini file for Docker EnTAP v1.2.0 (June 19, 2024) ------------------------------------------ * Removed --complete flag from Frame Selection, not needed with current implementation. May come back later. Do to this and other changes, please update your ini files, the latest are in the repository * Changed 'ontology' flag to 'ontology_source' to improve clarity when selecting ontology sources (EggNOG or InterProScan) * Changed 'protein' flag to 'interproscan-db' to improve clarity when specifying InterProScan databases * Added 'EggNOG COG Abbreviation' (from EggNOG-mapper output) and 'EggNOG COG Description' (mapping of abbreviation to descriptions) columns to 'entap_results.tsv' file * Added support for all EnTAP parameters through the command line as well as ini. In the event the command line is used, that will take precedence compared to the ini files * Fixed a compatibility issue that was causing certain DIAMOND runs to fail giving an error message relating to being unable to access a 'temporary directory' * Fixed an issue parsing certain GFF formats, issue would present itself as an error in parsing the file EnTAP v1.1.1 (May 21, 2024) ------------------------------------------ * Updated Dockerfile to include TransDecoder dependencies (DB_File and URI::Escape) EnTAP v1.1.0 (May 5, 2024) ------------------------------------------ * Removed GeneMarkS-T from supported Frame Selection software * Removed all figures from EnTAP. This will be replaced in the next version with updated figures/graphs. * Split entap_config.ini file into two separate files, entap_config.ini and entap_run.params. The intention is to make it clearer that entap_run.params is used for specific runs while the entap_config.ini file is setup once and not changed often. * Added support for Horizontal Gene Transfer analysis. New commands and EnTAP output added to allow for analysis. GFF and donor/recipient databases required for analysis. * Added support for utilizing eggnog-mapper (https://github.com/eggnogdb/eggnog-mapper) to access EggNOG databases. Added new output and commands for support. Added support for downloading necessary EggNOG databases during EnTAP configuration. Due to this change, run Configuration again if using this release! * Added warning to log file if taxonomic information could not be leveraged for a particular database during Similarity Searching * Changed 'ontology' output directory to 'gene_family' * Changed several statistics/percentages from Similarity Search and EggNOG to be based on total retained sequences rather than total contaminants * Suppressed warnings from a library used by EnTAP during compilation (may not work on all compilers) EnTAP v1.0.1 (November 13, 2023) ------------------------------------------ * Fixed an issue with the formatting of the '...gene_ontology_terms.tsv' file output from EnTAP EnTAP v1.0.0 (September 26, 2023) ------------------------------------------ * Updated RSEM (v1.3.3), TransDecoder (v5.7.1), and DIAMOND (v2.1.8) libraries in EnTAP repository. EnTAP is now compatible with these versions * Added new test data (under 'test_data') directory in the EnTAP repository. This should work with latest versions of software being used by EnTAP. See docs for how to run. * Added additional statistics/percentages at the end of the Log File * Added support for Tidyverse format (TSV's will now print 'NA' for empty data) * Added Dockerfile to EnTAP repository * Added support for a new Gene Ontology term TSV output format. Similar to other formats, but combined into one file. More info can be seen with '--output--format' flag * Changed DIAMOND runs (during Ontology and during Similarity Searching) to use 'very-sensitive' from 'more-sensitive'. This should give more alignments, but may take longer to execute now * Removed the '--level' flag for Gene Ontology levels. This was not useful to users and caused confusion. Instead, all GO Terms will be printed by default and gene ontology levels are removed from output * Renamed and restructured many of the output in the 'final_results' directory for better clarity * Fixed issue if trying to run EnTAP configuration locally to rebuild the EnTAP database. Users may have seen it fail during the Gene Ontology stage due to a change in formatting of the Gene Ontology database EnTAP Beta v0.10.9-beta (June 29, 2023) ------------------------------------------ * Optimizations made to similarity searching using the EnTAP SQL database. Should improve speed * Added 'api-taxon' command that will verify whether an input taxon can be found in the taxonomy database. It will return json formatted text * Added additional messaging throughout EnTAP execution to stdout * Removed Test Data from repository, it is no longer compatible with latest version of software within EnTAP. Will add back an updated dataset next version * Changed DIAMOND command to use '--max-target-seqs' instead of '--top' command * Fixed an issue where duplicate sequences were printed to the final_annotations files * Fixed an issue where the taxanomic species may not have been found when searching against the SQL EnTAP database EnTAP Beta v0.10.8-beta (March 21, 2021) ------------------------------------------ * This version requires a new version of the EnTAP database to be downloaded * Added Gene Enrichment files as an output option(gene ID + effective length and geneID + GO term). These can be seen with the output-type flag in the ini file * Changed Gene Ontology level printing. 0 will continue to print every term. Other levels will now print that level AND higher. So a level of 1 will print 1, 2, 3, etc. Previous a level of 1 would only print GO Terms with a level of 1 * Changed 'uninformative' input from a file to a list of terms in the ini file. Much more straightforward this way * If no alignments are found against a database during DIAMOND, the pipeline will no longer exit, it will continue to the next database. If no alignments are found against any databases, it will stop at that point * Fixed a bug where TransDecoder output may not have been parsed correctly for some users. This presented itself as a parsing error and halted EnTAP at that stage of the pipeline * Fixed bug where InterProScan Mobidlite database was giving an error for some users (and halting execution) EnTAP Beta v0.10.7-beta (October 6, 2020) ------------------------------------------ * Fixed an issue where certain sequence headers may not have been parsed properly resulting in unrecognized sequence errors during Similarity Searching EnTAP Beta v0.10.6-beta (August 26, 2020) ------------------------------------------ * Added support to pipe the TransDecoder flag '--no_refine_starts' during Execution * Fixed an issue where error messages during EggNOG searching would not get printed (seg fault) * Contaminant information will not be printed to the log if there are none EnTAP Beta v0.10.5-beta (August 12, 2020) ------------------------------------------ * Added a step to remove the stop codon ('*') sometimes printed at the end of the TransDecoder FASTA output. This may have caused an issue when running TransDecoder and InterProScan together EnTAP Beta v0.10.4-beta (July 29, 2020) ------------------------------------------ * Fixed an issue where expression analysis transcriptome generation would fail (error message presented to user as 'frame selection') EnTAP Beta v0.10.3-beta (July 28, 2020) ------------------------------------------ * Fixed a parsing issue of user inputs for contanminants and taxon EnTAP Beta v0.10.2-beta (July 26, 2020) ------------------------------------------ * Fixed a pathing issue when EnTAP generated frame selected transcriptomes EnTAP Beta v0.10.1-beta (July 19, 2020) ------------------------------------------ Note: Please use v0.10.2-beta or later instead of this version * Added support for TransDecoder for Frame Selection * Added TPM as an additional output from Expression Filtering * Added an .ini file and moved many commands/paths from the command line to this * Standardized/finalized output header namings for gFACs support * Changed the default Frame Selection software to TransDecoder. GeneMarkS-T can still be selected through the .ini file * Changed the default Gene Ontology level to 1. This can be easily changed through the ini file * Fixed issue where some EggNOG descriptions were not printed to the final output * Fixed a few issues with older GCC versions * Fixed an issue where GeneMarkS-T would write to the working directory EnTAP Beta v0.9.2-beta (June 4, 2020) ------------------------------------------ * Updated EggNOG Database links EnTAP Beta v0.9.1-beta (January 12, 2020) ------------------------------------------- * Changed --trim flag to --no-trim. Trimming sequence headers to the first space is the default now. If you have executions from previous versions, you may need to use the --no-trim flag as needed for backwards compatibility (picking up where you left off) * Fixed a bug where the --single-end command was not properly recognized EnTAP Beta v0.9.0-beta (May 12, 2019) ------------------------------------------- * This release focused on reducing installation complexity and removing dependencies * Overhauled the configuration/execution process by removing EggNOG-mapper and replacing it with an internal EnTAP method. This will make installation and both stages much clearer for the user * Removed Boost Libraries from dependencies further reducing installation complexity * Added printing of error messages to the standard log from any software being used by EnTAP. This will make debugging much easier * Added UniProt mapping to the EnTAP database. This will pull any additional mapping information from UniProt Swiss-Prot alignments * Updated supported DIAMOND version to 0.9.9 * The EnTAP database MUST be re-configured for this release * Resolved any incompatibility with DIAMOND and EggNOG databases as well as versioning problems * Standardized EnTAP log entries and added additional statistics * - -ontology flag will now use EnTAP's method of EggNOG accession (0) or InterProScan (1) * Bug fixes EnTAP Beta v0.8.4-beta (August 2, 2018) ------------------------------------------------ * Fixed an issue when inputting already translated sequences EnTAP Beta v0.8.3-beta (May 23, 2018) ------------------------------------------ * Minor bug fixes * Changes to CMake to hopefully resolve issues a couple users had with linking to Boost Libraries EnTAP Beta v0.8.2-beta (April 29, 2018) ------------------------------------------- * Revamped configuration stage of EnTAP (reduced time and hopefully made things clear/more compatible across systems) * Removed - -database-out flag (seemed a bit redundant to me). - -outfiles flag will be the default when indexing databases * Added - -data-generate flag. This can be specified in EnTAP config stage (no effect during execution) for whether you'd like to generate the EnTAP databases rather than downloading from FTP address * Added - -data-type flag. This can be used in either configuration or execution. Specifies which database you'd like to download/generate or use during execution. Binary (0, default) or SQL (1). Binary is faster with more memory usage, SQL will be slower but easier compatibility. * Combined EnTAP databases into one (entap_database.sql/entap_database.bin). WARNING: Re-download or configuration of databases is REQUIRED with this newer version. * Removed download_tax.py script (no longer necessary) EnTAP Beta v0.8.1-beta (April 14, 2018) ------------------------------------------ * Added additional error logging to provide more information when something goes wrong * Configuration file mandatory (default place to look is current working directory) * Changed tax database paths in config file to avoid confusion (separate text and bin). Config file must be re-downloaded/generated! * Defaults/output during configuration changed to config file then if not found, database-out flag * Added deletion of empty files if a certain stage failed (preventing re-reading an empty file) * Added errors/warnings for no alignments/hits in each stage * entap_out directory changed to transcriptomes to be more clear (holds only transcriptomic data) * Final EnTAP output files moved from the root outfiles directory to final_results directory * Several filename changes to add consistency in new transcriptomes directory (final transcriptome is now _final.fasta. * Several title changes to the log file to mitigate confusion * EggNOG no longer broken down into separate files - those that hit and those that did not hit a database. Now entire transcriptome is pushed with one output file * 10 species/contaminants/other in similarity searching statistics has been changed to 20 to provide more information to the user * Best hit selection state combined with similarity search * Added 'N' as an accepted nucleotide * Several behind the scenes changes * Fixed Cmake global installation issue * Fixed incorrect error codes * Fixed InterPro printing bug to no hits/hits files * Fixed Frame Selection not printing new lines for certain files EnTAP Beta v0.8.0-beta (December 16, 2017) ------------------------------------------------- * Overhaul of the taxonomic/gene ontology databases * Faster accession/indexing * MUST be re-downloaded and re-indexed (or use the updated versions that come with the EnTAP distribution) * Taxonomic database includes thousands more entries with synonyms of many species * Perl is no longer a dependency, with Python being used to download the database * Added blastx support * Blastx now allowed for ALL stages of annotation (similarity search + ontology) * --runN flag now specifies blastx (frame selection will not be ran) * --runP flag now specifies blastp (frame selection will be performed if nucleotide sequences are input) * Added InterProScan support * Now possible to run EggNOG and/or InterProScan (with both blastx or blastp) * EggNOG and/or InterProScan specified with --ontology flag (0 and/or 1) * Full output of both will be provided in the final annotations file * Added additional statistics to the log file for EggNOG and Expression Analysis * Added numerous file/path/software checks to the start of an EnTAP run * Test runs/path checks are performed on all software that will be ran * Additional checks to specific flags * These checks can be turned off for an EnTAP run with --no-check flag (not advised!) * --tag flag changed to --out-dir to specify output directory (not just what you'd like it named as) * Defaults to current directory with /outfiles folder * --paired-end flag for Expression Filtering changed to --single-end (with paired-end being the default) * Added contaminant and informative yes/no columns in final annotations file (among other headers) * Added ability to input your own list of informative/uninformative terms for EnTAP to flag * Added contaminant and none contaminant final annotation files * Fixed a sequence id issue in Expression Filtering not mapping to BAM/SAM file * Fixed a bug in --trim flag for sequence headers * Fixed a bug where some systems had issues with graphing * Debug and log files are now time stamped and not overwritten * Fixed pathing for EnTAP configuration and made more streamlined * Fixed several instances of older compilers complaining * Added a lot of error messaging to help diagnose any issues easily * Changed similarity search to have full database name, not path * Fixed a bug in parsing input fasta file (added corrupt file checks) EnTAP Beta v0.7.4.1-beta (September 5, 2017) -------------------------------------------------- * Minor changes to taxonomic database download and indexing EnTAP Beta v0.7.4-beta (August 26, 2017) ---------------------------------------------- * Initial beta release!