Authors
F Díaz-Viraqué1; C Robello2; 1 Institut Pasteur Montevideo, Uruguay; 2 Institut Pasteur Montevideo; Facultad de Medicina, Universidad de la República, Uruguay Discussion
Trypanosomatids are unicellular eukaryotes that differ from the rest of the eukaryotes in several aspects regarding RNA metabolism, gene expression regulation and gene organization. Recently, with the advent of long-read technologies (PacBio and Oxford Nanopore Technologies), several trypanosomatid genome assemblies were published using third-generation sequencing technologies which improves genome sequencing contiguity. Even though several efforts were made in order to correctly annotate coding genes and repetitive sequences, non-coding RNA annotation has not been thoroughly assessed. These RNA genes encode functional RNA products and they are often a neglected class of genes in large scale genome analysis probably due to their sequence and structure diversity that require more dedicated annotation. Since these genes do not present the features that define coding genes (e.g. long open reading frames) and instead present limited sequence conservation, classical strategies for gene annotation can not be used. The peculiar mechanisms of RNA metabolism in trypanosomatids and the improved genomes’ assemblies prompted us to study how the non-coding RNA genes are organized in these genomes. We used several optimized algorithms depending on the RNA to re-annotate them providing a complete annotation, including the identification of previously undescribed non-coding RNAs as well as the correct annotation of genes that were previously incorrectly assigned. In sum, this work reports a highly curated genome annotation, and unveils the organization of non-coding RNAs in trypanosomatid genome assemblies.