2. Frame Selection
Frame selection is the process of determining the coding region of a transcript. Oftentimes, due to assembly errors or other factors, a coding region may not be found for a transcript and EnTAP will remove this sequence. When a coding region is found, EnTAP will include the sequence for further annotation.
2.1. Running Frame Selection
TransDecoder (https://github.com/TransDecoder/TransDecoder/wiki) is leveraged to determine the coding regions for the user-input transcriptome. In order to run Frame Selection, the user must input nucleotide sequences along with setting the frame_selection command to true. This will tell EnTAP that you would like to translate your nucleotide into to proteins and continue with them for the rest of the pipeline. Additionally, the following commands and software paths should be reviewed!
2.1.1. Frame Selection Commands
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
frame_selection |
Specify if you would like to perform frame selection/gene prediction on your input nucleotide transcriptome. This flag will be ignored if you input protein sequences. If this is set to false with nucleotide input, EnTAP will perform blastx functionality where appropriate. |
R-ini |
bool |
true |
param |
description |
location (cmd/R-ini,E-ini) |
qualifier |
example |
|---|---|---|---|---|
transdecoder-m |
Specify the minimum protein length for TransDecoder |
R-ini |
integer |
100 |
transdecoder-no-refine-starts |
Specify this flag if you would like to pipe the TransDecoder command ‘–no_refine_starts’ when it is executed |
R-ini |
bool |
false |
transdecoder-long-exe |
Method to execute TransDecoder.LongOrfs. This may be the path to the executable, or simply, ‘TransDecoder.LongOrfs’ if installed globally |
E-ini |
string |
TransDecoder.LongOrfs |
transdecoder-predict-exe |
Method to execute TransDecoder.Predict. This may be the path to the executable, or simply, ‘TransDecoder.Predict’ if installed globally |
E-ini |
string |
TransDecoder.Predict |
2.2. Interpreting the Results
The /frame_selection/TransDecoder folder will contain all of the relevant information for the frame selection stage of the pipeline utilizing TransDecoder. This folder will contain results from TransDecoder as well as files generated by EnTAP (/frame_selection/TransDecoder/processed directory). The following table contains all of the files from this stage of EnTAP using an example transcriptome of “transcripts.fasta”.
filename |
description |
directory |
|---|---|---|
|
Generated from TransDecoder. Peptide sequences for the final candidate ORFs; all shorter candidates within longer ORFs were removed. |
|
|
Generated from TransDecoder. Contains positions within the target transcripts of the final selected ORFs (open reading frames). |
|
|
Generated from TransDecoder. Contains nucleotide sequences for coding regions of the final candidate ORFs. |
|
|
Generated from TransDecoder. Contains standard output/error information from the TransDecoder run. |
|
|
Generated from EnTAP. Contains amino acid sequences of complete genes from transcriptome. |
|
|
Generated from EnTAP. Contains amino acid sequences of partial (5’ and 3’) sequences. |
|
|
Generated from EnTAP. Contains amino acid sequences of internal sequences. |
|
|
Generated from EnTAP. Contains nucleotide sequences in which a frame was not found. These will not continue to the next stages of the pipeline. |
|
2.2.1. Frame Selection Headers
TSV files generated from EnTAP will have the following headers from Frame Selection.
Frame