1. Expression Analysis
The goal of expression filtering, or transcript quantification, is to determine the relative abundance levels of transcripts when taking into account the sequenced reads and how they map back to the assembled transcriptome and using this information to filter out suspect expression profiles possibly originated from poor or incomplete assemblies. Filtering is done through the use of RSEM to determine the FPKM (fragments per kilobase per of million mapped reads) value. The FPKM , or a measurable number of expression, is then used as a threshold by EnTAP to filter the transcriptome. Anything lower than that threshold is removed.
1.1. Running Expression Analysis
As mentioned above, RSEM (https://github.com/deweylab/RSEM) is utilized to determine the FPKM for each transcript. In order to run Expression Analysis, an alignment (BAM/SAM) file must be input to EnTAP through the align flag in the entap_run.params file. A BAM file is preffered, but if a SAM file is input, RSEM/EnTAP will conver this over for use with EnTAP through convert-sam-for-rsem RSEM package. Be sure to review the following relevant commands as well to tailor the FPKM threshold to what you prefer.
1.1.1. Expression Analysis Commands
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
align / a |
Path to the alignment file (either SAM or BAM format). Ignoring this flag will skip expression filtering. Be sure to look at the other Expression Analysis flags if using this. |
R-ini |
string |
/path/to/alignment.bam |
rsem-calculate-expression |
Specify the path to the |
E-ini |
string |
/libs/RSEM-1.3.3/rsem-calculate-expression |
rsem-sam-validator |
Specify the path to the |
E-ini |
string |
/libs/RSEM-1.3.3/rsem-sam-validator |
rsem-prepare-reference |
Specify the path to the |
E-ini |
string |
/libs/RSEM-1.3.3/rsem-prepare-reference |
rsem-convert-sam-for-rsem |
Specify the path to the |
E-ini |
string |
/libs/RSEM-1.3.3/rsem-convert-sam-for-rsem |
fpkm |
Specify the FPKM (fragments per kilobase of exon per million mapped fragments) cutoff for Expression Filtering. All transcripts below this number will be filtered out and removed. |
R-ini |
float |
0.5 |
single-end |
Signify your reads are single-end for RSEM execution instead of paired-end (default) |
R-ini |
bool |
true |
1.2. Interpreting the Results
The /expression/RSEM folder will contain all of the relevant information for this stage of the pipeline. This includes many files generated from RSEM as well as files generated from EnTAP. Files generated from EnTAP are contained within the /expression/RSEM/processed directory.
RSEM generates many files, but the genes.results file is what we are particularly interested in from the RSEM output. This contains the relevant FPKM values used for thresholding. The following files can be found within the /expression/RSEM directory using an example input transcriptome titled “Species.fasta”:
filename |
description |
directory |
|---|---|---|
|
Generated from RSEM. Contains important information for each transcsript such as FPKM and TPM |
|
|
Generated from EnTAP. Contains all of the transcripts that have been filtered out due to having an FPKM value below the threshold |
|
|
Generated from EnTAP. Contains all of the transcripts that have been retained due to having an FPKM threshold above the user input one |
|
1.2.1. Expression Analysis Headers
TSV files generated from EnTAP will have the following headers from Expression Analysis.
FPKM
TPM
Expression Effective Length