Penn Arts & Sciences Logo

MathBio Seminar

Monday, May 9, 2016 - 4:00pm

Matthias Steinrücken

University of Massachusetts Amherst


University of Pennsylvania

318 Carolyn Lynch Lab

The inference of historical population sizes from contemporary genomic sequence data has gained a lot of attention in recent years. A particular focus has been on recent exponential growth in humans. This recent growth has had a strong impact on the distribution of rare genetic variants, which are of particular importance when studying disease related genetic variation. The popular PSMC method (Li and Durbin, 2011) can be used to infer population sizes from a sample of two chromosomes. However, the small sample size severely limits the power of this method in the recent past. To improve inference in the recent past, we extend the Coalescent Hidden Markov model approach of PSMC to larger sample sizes. To this end, we augment the hidden states of the model to the total length of the coalescent trees relating the sampled chromosomes at each locus. A key ingredient in this extension is the joint distribution of the total tree length at two linked loci in populations of variable size. This joint distribution has received little attention in the literature, especially under variable population size. In this talk, we show that it can be obtained as the solution of a system of semi-linear hyperbolic partial differential equations. Such systems can cause instabilities in standard numerical solution approaches. We thus implement a solution scheme based on the method of characteristics that allows us to compute the solutions efficiently. We further compare these solutions to explicit coalescent simulations to demonstrate their accuracy. We will discuss potential extension of our approach to infer divergence times and migration rates in structured populations.