Installation

EnTAP is packaged with all of the software necessary to fully annotate a set of transcripts. It is optimized to allow a single-command execution for all steps in the pathway, including paramterization by the user. EnTAP does not have a graphical user interface but it does generate visual summaries for the user at each stage as well as detailed summary files and logs. EnTAP must be installed and configured in order to begin annotating! A test dataset comes with EnTAP to ensure it has been configured properly. Before full EnTAP installation, dependencies must be checked to see if they are included in your system (many are by default) and the accompanying pipeline software will need to be installed (unless already present on the system).

  1. System Requirements
  2. Dependency Check
  3. Pipeline Software
  4. EnTAP Installation

After installation is complete, EnTAP must be configured in order to start using it. Configuration will simply download the necessary databases that are used by EnTAP.

System Requirements

  • Operating System

    • UNIX-based systems
    • Tested on 64 bit systems: ubuntu 16.04, Rocks 6.1, Centos 6.3
  • Storage Minimum

    • EnTAP Database (Gene Ontology References + UniProt Mapping + NCBI Taxonomy): 1.5Gb
    • EggNOG Databases: 24Gb
    • DIAMOND Databases: ~13Gb (with RefSeq Complete Protein + Uniprot Swiss-Prot)
    • Additional storage for files generated depending on transcriptome size: upwards of 15Gb
  • Memory

    • At least 16 Gb of RAM (will vary depending on DIAMOND database sizes). More memory is highly recommended to reduce execution times.

Dependency Check

Before continuing on in the installation process, ensure that the following dependencies are fully installed on your system:

  • C++11 compiler (GCC 4.8.1 or later)
  • CMake (3.00 or later)
  • Python (2.7.12 or later) with support for the following modules
    • Matplotlib (figures generated by EnTAP)
  • Unix wget (generally included in most distros)
  • Unix gzip/tar (generally included in most distros)

Pipeline Software

EnTAP leverages several software distributions within the pipeline to provide the best quality annotations. The packages used (and their current/tested versions) can be seen below. This is not to say that newer versions will not be compatible, however they have not been tested yet with EnTAP. By default, EnTAP will use Transdecoder for frame selection, however both TransDecoder and GeneMarkS-T are supported and you may install either.

Note

If the software is already installed on your system, this stage can be skipped

Software:
  • RSEM (Expression Filtering with alignment file): version 1.3.0 packaged with EnTAP

  • TransDecoder (Frame Selection): version 5.3.0 packaged with EnTAP

  • GeneMarkS-T (Frame Selection): version 5.1 must be installed separately (if not using TransDecoder)

  • DIAMOND (Similarity Search): version 0.9.9 packaged with EnTAP

    • Version 0.8.31
    • Version 0.9.19
    • Version 0.9.9
  • InterProScan (Protein Databases): version 5.19 must be installed separately

If you have downloaded the full repository from the GitLab page, each of these (with the exception of GeneMarkS-T and InterProScan) are contained within the /libs directory. GeneMarkS-T must be acquired from the website linked previously due to licensing (free for academic use).

RSEM and DIAMOND both require compilation from source code while GeneMarkS-T does not. To compile these, follow the directions below. These are also found on the respective GitHub pages and are subject to change depending on the version.

DIAMOND Installation

From root EnTAP directory…

cd libs/diamond-0.8.31
mkdir bin
cd bin
cmake ..

Run the following command to install globally:

make install

Run the following command to compile:

make

All set! Ensure that DIAMOND has been properly setup and add the correct path to the entap_config.txt file. If installed globally, add ‘diamond’ (without quotes) to the file. If installed locally, add ‘path/to/EnTAP/libs/diamond-0.9.9/bin/diamond’.

RSEM Installation

From root EnTAP directory…

cd libs/RSEM-1.3.0
make
make ebseq

Run the following command to install globally:

make install

All set! Ensure that RSEM has been properly setup and add the correct path to the entap_config.txt file. If installed globally keep blank. If installed locally, add ‘path/to/EnTAP/libs/RSEM-1.3.0/’.

EnTAP Installation

Once dependencies and pipeline software have been installed, you can now continue to install EnTAP!

First, download and extract the latest release(tagged) version from GitLab: https://gitlab.com/EnTAP/EnTAP/tags

Within the main directory, execute the following command:

cmake CMakeLists.txt

This will generate a MakeFile. Then execute:

make

Or to install to a destination directory:

cmake CMakeLists.txt -DCMAKE_INSTALL_PREFIX=/destination/dir
make install

If you receive no errors, please move on to the last stage in installation, configuration.