Penn Arts & Sciences Logo

MathBio Seminar

Monday, March 12, 2012 - 12:15pm

Tandy Warnow

University of Texas


University of Pennsylvania

Levine 512

Estimating the Tree of Life will likely involve a two-step procedure, where in the first step trees are estimated on many genes, and then the gene trees are combined into a tree on all the taxa. The computational problems involved here are substantial and interesting -- even if the gene trees are correctly estimated, because the individual gene trees may not include all the species, finding the true tree that contains all the species is NP-hard. Furthermore, estimated gene trees may not be correct, making the estimation problem additionally challenging. Finally, the true gene trees may not agree with each other or with the species tree, due to biological processes such as deep coalescence, gene duplication and loss, and horizontal gene transfer. In this talk, I will present two recent algorithms that make advances on these problems. The first algorithm is SuperFine (Swenson et al., 2011, Systematic Biology), which produces more accurate supertrees than any other supertree method, and does so extremely quickly, under the assumption that the true gene trees are subtrees of the true species tree. The second algorithm is by Yu et al. (RECOMB 2011 and J Computational Biology), and estimates species trees from estimated gene trees, while allowing gene trees to differ from the true species tree due to deep coalescence.