TERMINUS ( Telomeric End Read Mining IN Unassembled Sequences)


Motivation

Telomeres are highly under-represented in genome databases, however they are often highly abundant among raw, unassembled sequence traces. To utilize the rich scores of telomere information among raw sequences, we developed TERMINUS, which consists of three PERL scripts to extract, assemble and categorize telomeric end reads; and, finally, to link telomeres up to contigs in the genome assembly.

Results

TERMINUS was tested using data obtained from a project to sequence the genome of the fungus Aspergillus nidulans. TERMINUS identified over 700 reads containing the telomere repeat sequence (TTAGGG)n among approximately 623 000 sequence traces. Approximately 650 candidate telomere reads were piped into StackPACK, which assembled them into 11 telomeric contigs ("TelContigs"). Paired reads corresponding to each telomeric sequence were also retrieved and assembled into 97 subtelomeric contigs (SubTelContigs) and 34 subtelomeric singletons ("SubTelSingletons"). TERMINUS then used BLAST with consensus sequences as queries to search for matches to the assembled genome sequence. Finally, it performed positional consistancy checks on each of the BLAST matches to ensure that its location and orientation was consistent with the predicted position of the telomere. In total, TERMINUS identified matches to 16 supercontigs, one for each end of the eight A. nidulans chromosomes. In addition, it provided evidence of a mis-assembly in the genome sequence.

Results download

TERMINUS has been used to map the telomeres for seveal fungal genomes. The results are available for download:

Aspergillus fumigatus: [Aspergillus_fumigatus.tar.gz]
Aspergillus nidulans: [Aspergillus_nidulans.tar.gz]
Coccidioides immitis: [Coccidioides_immitis.tar.gz]
Coprinus cinereus: [Coprinus_cinereus.tar.gz]
Cryptococcus neoformans: [Cryptococcus_neoformans_serotype_a.tar.gz]
Fusarium graminearum: [Fusarium_graminearum.tar.gz]
Magnaporthe grisea: [Magnaporthe_grisea.tar.gz]
Neurospora crassa: [Neurospora_crassa.tar.gz]
Phanerochaete chrysosporium: [Phanerochaete_chrysosporium.tar.gz]
Phytophthora ramorum: [Phytophthora_ramorum.tar.gz]
Phytophthora sojae: [Phytophthora_sojae.tar.gz]
Ustilago maydis: [Ustilago_maydis.tar.gz]

Publication

TERMINUS - telomeric end-read mining IN unassembled sequences.
Bioinformatics.2004 Dec 7; [Epub ahead of print] [Link to the paper]
Weixi Li1, Cathryn J. Rehmeyer2, Chuck Staben1 and Mark L. Farman2 (1Department of Biological Sciences and 2Department of Plant Pathology, University of Kentucky).

How does TERMINUS work?

TERMINUS flowchart
Schematic showing how the TERMINUS scripts identify telomeric sequences and link them to the assembled genome. (A high resolution version of the scheme is available here.)

Download TERMINUS

[TERMINUS.tar.gz] (85,698 bytes)