Penn Arts & Sciences Logo

MathBio Seminar

Tuesday, March 13, 2012 - 10:30am

Tandy Warnow

The University of Texas at Austin, Department of Computer Science


University of Pennsylvania

DRL 4N49

Phylogenetic placement arises in the analysis of metagenomic data, in which the objective is to insert short molecular sequences (called "query sequences") into an existing phylogenetic tree and alignment on full-length sequences for the same gene. We present SEPP, a general "boosting" technique to improve the accuracy and/or speed of phylogenetic placement techniques. The key algorithmic aspect of SEPP is a dataset decomposition technique in SAT\´e (Liu et al., Science 2009 and Systematic Biology 2012, a method that utilizes an iterative divide-and-conquer technique to co-estimate alignments and trees on large molecular sequence datasets. We show that SEPP improves current phylogenetic placement methods, placing metagenomic sequences more accurately when the set of input sequences has a large evolutionary diameter and produces placements of comparable accuracy in a fraction of the time for easier cases. Finally, we present TIPP, an extension of SEPP, that enables taxon identification for short reads, and which produces dramatically improved accuracy over current taxon identification methods.