1. Expression Analysis

The goal of expression filtering, or transcript quantification, is to determine the relative abundance levels of transcripts when taking into account the sequenced reads and how they map back to the assembled transcriptome and using this information to filter out suspect expression profiles possibly originated from poor or incomplete assemblies. Filtering is done through the use of RSEM to determine the FPKM (fragments per kilobase per of million mapped reads) value. The FPKM , or a measurable number of expression, is then used as a threshold by EnTAP to filter the transcriptome. Anything lower than that threshold is removed.

1.1. Running Expression Analysis

As mentioned above, RSEM (https://github.com/deweylab/RSEM) is utilized to determine the FPKM for each transcript. In order to run Expression Analysis, an alignment (BAM/SAM) file must be input to EnTAP through the align flag in the entap_run.params file. A BAM file is preffered, but if a SAM file is input, RSEM/EnTAP will conver this over for use with EnTAP through convert-sam-for-rsem RSEM package. Be sure to review the following relevant commands as well to tailor the FPKM threshold to what you prefer.

1.1.1. Expression Analysis Commands

Expression Analysis Flags

param

description

location (cmd/R-ini,E-ini)

qualifier

example

align / a

Path to the alignment file (either SAM or BAM format). Ignoring this flag will skip expression filtering. Be sure to look at the other Expression Analysis flags if using this.

R-ini

string

/path/to/alignment.bam

rsem-calculate-expression

Specify the path to the rsem-calculate-expression file generated during installation of RSEM

E-ini

string

/libs/RSEM-1.3.3/rsem-calculate-expression

rsem-sam-validator

Specify the path to the rsem-sam-validator file generated during installation of RSEM

E-ini

string

/libs/RSEM-1.3.3/rsem-sam-validator

rsem-prepare-reference

Specify the path to the rsem-prepare-reference file generated during installation of RSEM

E-ini

string

/libs/RSEM-1.3.3/rsem-prepare-reference

rsem-convert-sam-for-rsem

Specify the path to the rsem-convert-sam-for-rsem file generated during installation of RSEM

E-ini

string

/libs/RSEM-1.3.3/rsem-convert-sam-for-rsem

fpkm

Specify the FPKM (fragments per kilobase of exon per million mapped fragments) cutoff for Expression Filtering. All transcripts below this number will be filtered out and removed.

R-ini

float

0.5

single-end

Signify your reads are single-end for RSEM execution instead of paired-end (default)

R-ini

bool

true

1.2. Interpreting the Results

The /expression/RSEM folder will contain all of the relevant information for this stage of the pipeline. This includes many files generated from RSEM as well as files generated from EnTAP. Files generated from EnTAP are contained within the /expression/RSEM/processed directory.

RSEM generates many files, but the genes.results file is what we are particularly interested in from the RSEM output. This contains the relevant FPKM values used for thresholding. The following files can be found within the /expression/RSEM directory using an example input transcriptome titled “Species.fasta”:

Expression Analysis Results

filename

description

directory

Species.fasta.genes.results

Generated from RSEM. Contains important information for each transcsript such as FPKM and TPM

/expression/RSEM

Species_removed.fasta

Generated from EnTAP. Contains all of the transcripts that have been filtered out due to having an FPKM value below the threshold

/expression/RSEM/processed

Species_kept.fasta

Generated from EnTAP. Contains all of the transcripts that have been retained due to having an FPKM threshold above the user input one

/expression/RSEM/processed

1.2.1. Expression Analysis Headers

TSV files generated from EnTAP will have the following headers from Expression Analysis.

  • FPKM

  • TPM

  • Expression Effective Length