Research

Multi-omic data integration

Multi-omic studies are becoming commonplace, with studies simultaneously measuring RNA sequencing, DNA methylation, chromatin accessability, and more. However, integration of these data remains challenging. We are interested in the development of methods to extract meaningful shared signal from these data and their application to the biology of human complex traits and disease. Canonical Correlation Analysis (CCA) represents a natural framework for thinking about these problems in genomics. As a graduate student, BCB helped develop Principal Component Correlation Analysis, which we used to find under-appreciated population structure in the GEUVADIS study. More recently, we developed a multi-modal extension of CCA which uses a probabilistic graphical model to simultaneously estimate shared and private features in multi-omic data.

Causal network inference and Mendelian randomization

Recently, the role of network structure in complex trait genetics has received renewed attention. Network structure estimation (also called causal discovery) is a challenging problem, however progress can be made by leveraging perturbation-response data, where one or more nodes in the network are perturbed and the response of the remaining variables is observed. We consider two sources of large-scale perturbation response data: large-scale CRISPR-based inhibition data, and genetic association data. Genetic variants provide a natural source of perturbations, which can be used to estimate putatively causal effects using a technique called Mendelian randomization. We have developed Welch-weighted Egger regression (WWER), a technique for fast estimation of pairwise bi-directed causal effects in bio-bank scale data, as well as inverse sparse regression (inspre), a strategy for turning a dense network of pairwise causal effects into a sparse directed graph. We have applied these techniques to the genome-wide perturb seq and UK biobank datasets.

Cross-population complex trait analysis

Human phenotypes vary in their global distributions due to a combination of genetic and environmental factors, however the vast majority of genetic studies of disease focus on individuals of European ancestry. We asked a simple question - does the same genetic variant have the same phenotypic effect in two ancestal populations? This led to the development of the cross-population genetic correlation, and one of the first studies that revealed practical concerns applying European-derived genetic information to other populations in the context of complex traits. The resulting tool popcorn is now widely used.