The sample frequency spectrum (SFS) describes the distribution of allele counts at segregating sites, and is a useful statistic for both summarizing genetic data and inferring biological parameters. SFS-based inference proceeds by comparing observed and expected values of the SFS, but computing the expectations is computationally challenging when there are multiple populations related by a complex demographic history.
We are developing a new software package, momi (MOran Models for Inference), that computes the multipopulation SFS under population size changes (including exponential growth), population mergers and splits, and pulse admixture events. Underlying momi is a multipopulation Moran model, which is equivalent to the coalescent and the Wright-Fisher diffusion, but has computational advantages in both speed and numerical stability. Techniques from graphical models are used to integrate out historical allele frequencies. Automatic differentiation provides the gradient and Hessian, which are useful for searching through parameter space and for computing asymptotic confidence intervals.
Using momi, we are able to compute the exact SFS for more complex demographies than previously possible. In addition, the expectations of a wide range of statistics, such as the time to most recent common ancestor (TMRCA) and total branch length, can also be efficiently computed. The scaling properties of momi depend heavily on the pattern of migration events, but for certain demographic histories, momi can scale up to tens to hundreds of populations. We demonstrate the accuracy of momi by applying it to simulated data, and are in the process of applying it to real data to infer a model of human history involving archaic hominins (Neanderthal and Denisovan) and modern humans in Africa, Europe, East Asia, and Melanesia.