Discussion
Abstract
Cells express thousands of proteins that differ in abundance over a wide range. In nucleated cells, messenger RNA untranslated regions (UTRs) can contribute to post-transcriptional expression control, but relatively few UTRs have been shown to exert these controls, and cis-regulatory sequences have remained largely uncharacterised. In the trypanosomatids, genome-wide polycistronic transcription places a particular emphasis on post-transcriptional controls. We used a massive parallel reporter assay coupled with UTR-seq profiling in the African trypanosome, revealing post-transcriptional reprogramming by thousands of 3’-UTRs. Genome-scale UTR-seq screening identified an abundance of regulatory fragments that either increased or reduced reporter expression. Analysis of regulatory fragments and native UTRs yielded a correlation between gene expression and 3’-UTR sequence composition. A machine learning approach guided by these findings effectively predicted observed measures of translation efficiency at a transcriptomic scale; R2 was 0.69. This approach also provided quantitative measures of the relative contribution of sequences within native 3’-UTRs and revealed similarly predictive sequences within 5’-UTRs. Thus, UTR-seq reveals the cis-regulatory UTR sequences that control gene expression in the context of a genome that lacks promoter-based transcription control. Thus, gene expression profiles in trypanosomes are post-transcriptionally reprogramed by thousands of regulatory UTRs.