New genome sequences for parasites in the genus
Cryptosporidium are emerging with regularly. However, because of insufficient starting material, non-clonal infections, the use of short-read sequencing technologies and poor availability of experimental resources needed for validation, fundamental gaps in our characterization of the genome sequence exist. We do not have complete chromosomal assemblies with confirmed genome structure for each species, we lack experimentally confirmed gene content/annotation and most importantly, we lack information on gene function. Our aim has been to generate the best possible structural and functional genome annotation for three closely related species of
Cryptosporidium,
C. parvum strain IOWA, a patient isolated strain,
C. hominis 30976 and the new
Cryptosporidium mouse model species,
C. tyzzeri, using all available public data. Using ESTs, cDNA, RNA-Seq, mass spectrometry proteomics data, synteny and gene orthology information, we trained three different gene prediction tools and all data were added as evidence tracks in WebApollo2 for manual curation of all three genome sequences. Relative to the previous genome annotations available for
C. parvum IOWA and
C. hominis TU502 genome, > 1,500 changes to the structural annotation have been made. These changes are related to altered gene boundaries, such as adding UTRs, altering the start codon and updating or adding intron features, as well as adding > 50 new RNA-seq supported genes to the annotation. More than 800 evidence-supported introns have been added to each genome sequence annotation and and alternative splicing is detected. Many previously annotated single-copy genes are shown to be multi-copy and copy number variation is detected between the species. The functional analysis was greatly improved and domains have been identified in many uncharacterized proteins and 98 additional transporters have been identified. Although the number of characterized proteins is greatly improved, approximately ~35% of the annotated genes in each species are still characterized as hypothetical. Additional experimental data are essential for bettering our understanding of
Cryptosporidium. The new annotations have been submitted prepublication to
CryptoDB.org and GenBank to facilitate immediate access by the research community.