.. |frame_dir| replace:: :file:`/frame_selection/TransDecoder` .. |frame_proc_dir| replace:: :file:`/frame_selection/TransDecoder/processed` .. |transdecoder_git| replace:: https://github.com/TransDecoder/TransDecoder/wiki Frame Selection ============================= Frame selection is the process of determining the coding region of a transcript. Oftentimes, due to assembly errors or other factors, a coding region may not be found for a transcript and EnTAP will remove this sequence. When a coding region is found, EnTAP will include the sequence for further annotation. Running Frame Selection ------------------------------ TransDecoder (|transdecoder_git|) is leveraged to determine the coding regions for the user-input transcriptome. In order to run Frame Selection, the user must input nucleotide sequences along with setting the :file:`frame_selection` command to true. This will tell EnTAP that you would like to translate your nucleotide into to proteins and continue with them for the rest of the pipeline. Additionally, the following commands and software paths should be reviewed! Frame Selection Commands ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: Frame Selection Flags :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - frame_selection - Specify if you would like to perform frame selection/gene prediction on your input nucleotide transcriptome. This flag will be ignored if you input protein sequences. If this is set to false with nucleotide input, EnTAP will perform blastx functionality where appropriate. - R-ini - bool - true .. list-table:: Frame Selection - TransDecoder Specific Flags :align: left :widths: 10 50 10 10 10 :header-rows: 1 * - param - description - location (cmd/R-ini,E-ini) - qualifier - example * - transdecoder-m - Specify the minimum protein length for TransDecoder - R-ini - integer - 100 * - transdecoder-no-refine-starts - Specify this flag if you would like to pipe the TransDecoder command '--no_refine_starts' when it is executed - R-ini - bool - false * - transdecoder-long-exe - Method to execute TransDecoder.LongOrfs. This may be the path to the executable, or simply, 'TransDecoder.LongOrfs' if installed globally - E-ini - string - TransDecoder.LongOrfs * - transdecoder-predict-exe - Method to execute TransDecoder.Predict. This may be the path to the executable, or simply, 'TransDecoder.Predict' if installed globally - E-ini - string - TransDecoder.Predict Interpreting the Results ------------------------------- The |frame_dir| folder will contain all of the relevant information for the frame selection stage of the pipeline utilizing TransDecoder. This folder will contain results from TransDecoder as well as files generated by EnTAP (|frame_proc_dir| directory). The following table contains all of the files from this stage of EnTAP using an example transcriptome of "transcripts.fasta". .. list-table:: **Frame Selection Results** :align: left :widths: 10 50 10 :header-rows: 1 * - filename - description - directory * - :file:`transcripts.fasta.transdecoder.pep` - Generated from TransDecoder. Peptide sequences for the final candidate ORFs; all shorter candidates within longer ORFs were removed. - |frame_dir| * - :file:`transcripts.fasta.transdecoder.gff3` - Generated from TransDecoder. Contains positions within the target transcripts of the final selected ORFs (open reading frames). - |frame_dir| * - :file:`transcripts.fasta.transdecoder.cds` - Generated from TransDecoder. Contains nucleotide sequences for coding regions of the final candidate ORFs. - |frame_dir| * - :file:`prediction_std.out/err` - Generated from TransDecoder. Contains standard output/error information from the TransDecoder run. - |frame_dir| * - :file:`transdecoder_complete_genes.fasta` - Generated from EnTAP. Contains amino acid sequences of complete genes from transcriptome. - |frame_proc_dir| * - :file:`transdecoder_partial_genes.fasta` - Generated from EnTAP. Contains amino acid sequences of partial (5' and 3') sequences. - |frame_proc_dir| * - :file:`transdecoder_internal_genes.fasta` - Generated from EnTAP. Contains amino acid sequences of internal sequences. - |frame_proc_dir| * - :file:`transdecoder_sequences_lost.fasta` - Generated from EnTAP. Contains nucleotide sequences in which a frame was not found. These will not continue to the next stages of the pipeline. - |frame_proc_dir| Frame Selection Headers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TSV files generated from EnTAP will have the following headers from Frame Selection. * Frame