Authors
H Imamura1; M Domagalska1; F Dumetz1; M Vanaerschot1; J Cotton3; M Berriman3; J Vermeesch2; J C Dujardin1; 1 Institute of Tropical Medicine, Antwerp, Belgium; 2 KU-Leuven, Belgium; 3 Wellcome Trust Sanger Institute Discussion
In 2011, the first L. donovani reference genome, based on a clinical isolate from the Indian Subcontinent (BPK282/0cl4), was assembled using a combination of 454 and Illumina sequencing. Using the Pacbio RSII sequencer and P5-C3 chemistry, we have re-sequenced that genome, yielding around 616,900 post-filtered reads with an average length of 8.4 kb (131x coverage). SMRTanalysis tools were used to assemble the reads and final base/indel correction was carried out by ICORN using Illumina reads, while annotations were added using Companion. The PacBio assembly resulted in 36 chromosomes and, for the first time, a full maxicircle. At the same time, quality increased by reducing gaps from 2142 in the previous reference to 20. Complex regions were refined like intra-chromosomal amplicons on chromosome 23 (H-locus) and 36 (MPK1) or other repetitive regions such as the HSP70 and miniexon tandem-arrays. Further improvements were also made in telomeric regions. In addition, inversions and miss-assemblies in the old reference sequence were corrected, reducing a substantial number of false positive genetic variations. These improvements led to a better gene annotation and provided new insights into previously poorly characterized regions. Those particular features are required for a thorough investigation of the genome stability and plasticity of the Leishmania genome at population and single cell levels.