Penn Arts & Sciences Logo

MathBio Seminar

Monday, September 19, 2016 - 4:00pm

Barbara Engelhardt

Princeton University

Location

University of Pennsylvania

318 Carolyn Lynch Lab

Latent factor models have been the recent focus of much attention in "big data" applications because of their ability to quickly allow the user to explore the underlying data in a controlled and interpretable way. In genomics, latent factor models are commonly used to identify population substructure, identify gene clusters, and control noise in large data sets. In this talk I present a general framework for Bayesian latent factor models. I will illustrate the power of these models for a broad class of structured problems in genomics via application to the Genotype-tissue Expression (GTEx) data set. In particular, by using a Bayesian biclustering version of this model, the estimated latent structure may be used to identify gene co-expression networks that co-vary uniquely in one tissue type (and other conditions). We validate network edges using tissue-specific expression quantitative trait loci.