STROLL Home Page
STROLL -- A Fragment Assembly Program
for Large-Scale Genome Sequencing
Ting Chen ,
Department of Genetics, Harvard Medical School
Steven Skiena ,
Department of Computer Science, SUNY Stony Brook
STROLL is a fragment assembly program designed for
large-scale (megabase level) genome sequencing.
You can ftp version 1.2 of STROLL source code
STROLL1.2.tar.gz,
the executable file for SUN Solaries 2.X stroll.gz ,
and the documentation file README .
Borrelia Genome Sequencing Project at Brookhaven
STROLL was developed in the Borrelia Genome Sequencing Project
at
Brookhaven National Laboratory .
Biologists at
Brookhaven National Laboratory have implemented
a reliable technique to sequence DNA using primer walking.
In an effort to sequence the one-megabase genome of
Borrelia burgdorferi, the bacterium which causes Lyme disease,
they propose a strategy:
after a thin coverage shotgun sequencing phase,
the gaps are closed in the primer walking phase.
A complete
Borrelia genome has been published.
STROLL played a central role in this sequencing project.
Algorithms and Data Structures in STROLL
- Use of a space efficient data structure, suffix array, to quickly reject
non-overlapping fragment pairs, reducing the number of
calls to the pairwise
comparison.
- Use of a fast banded pairwise comparison algorithm, with affine gap
penalties and base qualities to search
local similarities of two fragments.
- Use of an overlap recovery strategy (the transitive relations) to recover
most of the undetected overlaps.
- Use of an incremental multiple alignment algorithm to add fragments into contigs
one-by-one in the order of pairwise alignment quality.
STROLL Performance
All these algorithms have been chosen purposely for gaining
faster speed and using less space. In a megabase sequencing project,
the memory space can easily go up to several hundreds of megabytes and
the time can be as much as tens of hours. To aid the increasing scale
of the sequencing effort, all our algorithms run in
close-to linear time and take a linear space.
STROLL has been tested on a data set with more than 8,000
shotgun sequencing fragments.
It took only 81 minutes to finish the job on SUN Sparc10 with 128 Mb RAM.
More information about STROLL's perfomance, please check the following
papers.
Publications
- STROLL: A new fragment assembly program.(Ting Chen and Steven Skiena).
- Trie-based data structures for fragment assembly.(Ting Chen and Steven Skiena).
The Eighth Symposium on Combinatorial Pattern Matching, Aarhus, Denmark, June 30 - July 2, 1997.
Contact
Email your question to
tchen@salt2.med.harvard.edu
or skiena@cs.sunysb.edu .
back to Tim Chen's home page.