Review Article: The Hybrid Mouse Diversity Panel

This year, we published a review of studies on the Hybrid Mouse Diversity Panel (HMDP) dataset, a project led by Aldons J. Lusis (David Geffen School of Medicine at UCLA). Our paper in Journal of Lipid Research describes the dataset, summarizes current discoveries facilitated by the dataset, and explains how researchers can use correlation, genetic mapping, and statistical modeling methods with HMDP data to address cardiometabolic questions.

The Hybrid Mouse Diversity Panel (HMDP) is a collection of approximately 100 well-characterized inbred strains of mice that can be used to analyze the genetic and environmental factors underlying complex traits. While not nearly as powerful for mapping genetic loci contributing to the traits as human genome-wide association studies, it has some important advantages. First, environmental factors can be controlled. Second, relevant tissues are accessible for global molecular phenotyping. Finally, because inbred strains are renewable, results from separate studies can be integrated.

Since its development in 2010, studies using the HMDP have validated over a dozen novel genes underlying complex traits. High-throughput technologies have been used to examine the genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes of mice subjected to various environmental conditions. These analyses have identified many novel genes and significant loci associated with disease risk relevant to obesity, diabetes, atherosclerosis, osteoporosis, heart failure, immune regulation, and fatty liver disease.

The HMDP has substantial potential to advance interdisciplinary research on genetics and computational biology. In order to make HMDP and associated methods accessible to cardiometabolic researchers, our paper includes a glossary of genetics terms and an outline of how the database can be interrogated to address certain questions using correlation, genetic mapping, and statistical modeling.

All of the published data are available and can be readily used to formulate hypotheses about genes, pathways, and interactions. For more information about HMDP, read our article:

The full citation to our paper is:

Lusis, Aldons J; Seldin, Marcus; Allayee, Hooman; Bennett, Brian J; Civelek, Mete; Davis, Richard C; Eskin, Eleazar; Farber, Charles; Hui, Simon T; Mehrabian, Margarete; Norheim, Frode; Pan, Calvin; Parks, Brian; Rau, Christoph; Smith, Desmond J; Vallim, Thomas; Wang, Yibin; Wang, Jessica

The Hybrid Mouse Diversity Panel: A Resource for Systems Genetics Analyses of Metabolic and Cardiovascular Traits. Journal Article

In: J Lipid Res, 2016, ISSN: 1539-7262.

Abstract | Links | BibTeX



Hypothetical examples of how information from the HMDP can be utilized to explore relationships between genes (A) and traits (B) of interest. Read our paper for more information on methods for exploring their relationships with multiple layers of information.

Genes, Environments and Meta-Analysis

Figure 1. Application of Meta-GxE to Apoa2 locus. The forest plot (A) shows heterogeneity in the effect sizes across different studies. The PM- plot (B) predicts that 7 studies have an effect at this locus, even though only 1 study (HMDP-chow(M)) is genome-wide significant with P-value. doi:10.1371/journal.pgen.1004022.g001

It is well known that both genetic factors and environmental factors contribute to traits and specifically disease risk. In addition, an area of great interest in the research community is the interaction between genetic factors and environmental factors and their contribution to disease risk and other traits. Genetic variants that are involved in gene by environment interactions (denoted GxE) have a different effect on the trait spending on the environment. For example, some variants can have an effect on cholesterol levels only in the presence of a high fat diet. Discovering variants involved in GxE has been tremendously difficult and even though thousands of variants have been implicated in disease related traits using genome wide association studies, very few variants have been implicated in GxEs. Part of the difficulty in detecting GxEs is that the traditional approach requires analyzing studies which contain individuals with multiple environments.

We have recently published a paper with the A. Jake Lusis group in PLoS Genetics which presents a novel approach to discovering GxEs. In our approach, many different studies, each which was performed in different environments, are combined to identify GxEs. The key idea is that if variants have a different genetic effect in different environments, then these variants are candidates for being involved in GxEs. Combining studies together is a statistical technique called meta-analysis which has been a major focus of our lab the past few years. We show in the paper, the mathematically, searching for GxEs using the traditional approach and a type of meta-analysis framework called the random effects model(21565292) are very closely related.

We applied our approach to identify GxEs affected mouse HDL cholesterol by combining 17 mouse studies collected by A. Jake Lusis’ group containing almost 5,000 animals. Our approach discovered 26 loci involved in HDL, many of which appear to be involved in GxE. Virtually all of these loci were not previously discovered in any of the individual studies, but many of them map to genes known to affect HDL. Our approach also includes a visualization framework called a PM-plot which helps interpret the associated loci to help identify GxE interactions(22396665).

From the paper:

Discovering environmentally-specific loci using meta-analysis
The Meta-GxE strategy uses a meta-analytic approach to identify gene-by-environment inter- actions by combining studies that collect the same phenotype under different conditions. Our method consists of four steps. First, we apply a random effects model meta-analysis (RE) to identify loci associated with a trait considering all of the studies together. The RE method explicitly models the fact that loci may have different effects in different studies due to gene-by- environment interactions. Second, we apply a heterogeneity test to identify loci with significant gene-by-environment interactions. Third, we compute the m-value of each study to identify in which studies a given variant has an effect and in which it does not. Forth, we visualize the result through a forest plot and PM-plot to understand the underlying nature of gene-by-environment interactions.
We illustrate our methodology by examining a well-known region on mouse chromosome 1 harboring the Apoa2 gene, which is known to be strongly associated with HDL cholesterol (8332912). Figure 1 shows the results of applying our method to this locus. We first compute the effect size and its standard deviation for each of the 17 studies. These results are shown as a forest plot in Figure 1 (a). Second we compute the P-value for each individual study also shown in Figure 1 (a). If we were to follow traditional methodology and evaluate each study separately, we would declare an effect present in a study if the P-value exceeds a predefined genome-wide significance threshold (P < 1.0×10−6). In this case, we would only identify the locus as associated in a single study, HMDP-chow(M) (P = 6.84×10−9). On the other hand, in our approach, we combine all studies to compute a single P-value for each locus taking into account heterogeneity between studies. This approach leads to increased power over the simple approach considering each study separately. The combined meta P-value for the Apoa2 locus is very significant (4.41 × 10−22), which is consistent with the fact that the largest individual study only has 749 animals compared to 4,965 in our combined study.
We visualize the results through a PM-plot, in which P-values are simultaneously visualized with the m-values, which estimates the posterior probability of an effect being present in a study given the observations from all other studies, at each tested locus. These plots allow us to identify in which studies a given variant has an effect and in which it does not. M-values for a given variant have the following interpretation: a study with a small m-value(≤ 0.1) is predicted not to be affected by the variant, while a study with a large m-value(≥ 0.9) is predicted to be affected by the variant.
The PM-plot for the Apoa2 locus is shown in Figure 1 (b). If we only look at the separate study P-values (y-axis), we can conclude that this locus only has an effect in HMDP-chow(M). However, if we look at m-value (x-axis), then we find 8 studies (HMDPxB-ath(M), HMDPxB- ath(F), HMDP-chow(M), HMDP-fat(M), HMDP-fat(F), BxD-db-5(M), BxH-apoe(M), BxH- apoe(F)), where we predict that the variation has an effect, while in 3 studies (BxD-db-12(F), BxD-db-5(F), BxH-wt(M)) we predict there is no effect. The predictions for the remaining 6 studies are ambiguous.
From Figure 1, we observe that differences in effect sizes among the studies are remarkably consistent when considering the environmental factors of each study as described in Table 1. For example, when comparing study 1 – 4, the effect size of the locus decreases in both the male and female HMDPxB studies in the chow diet (chow study) relative to the fat diet (ath study). Thus we can see that when the mice have Leiden/CETP transgene, which cause high total cholesterol level and high LDL cholesterol level, effect size of this locus on HDL cholesterol level in blood is affected by the fat level of diet. Similarly, when comparing study 12 – 15, the knockout of the Apoe gene affects the effect sizes for both male and female BxH crosses. However, in the BxD cross (study 8 – 11), where each animal is homozygous for a mutation causing a deficiency of the leptin receptor, the effect of the locus is very strong in the young male animals, while as animals get older and become fatter, the effect becomes weaker. However in the case of female mice, the effect of the locus is nearly absent at both 5 and 12 weeks of age. Thus we can see that sex plays an important role in affecting HDL when the leptin receptor activity is deficient .

The full citation of our paper is:

Kang, Eun Yong; Han, Buhm; Furlotte, Nicholas; Joo, Jong Wha J; Shih, Diana; Davis, Richard C; Lusis, Aldons J; Eskin, Eleazar

Meta-Analysis Identifies Gene-by-Environment Interactions as Demonstrated in a Study of 4,965 Mice Journal Article

In: PLoS Genet, 10 (1), pp. e1004022, 2014, ISSN: 1553-7404.

Abstract | Links | BibTeX


Review Article: Mixed Models and Population Structure

mixed-model-figureMixed models are now widely used for association studies in order to correct for population structure.  A simple intuitive description of how and why they work is provided in our Mouse GWAS review(10.1038/nrg3335) paper published in Nature Genetics as a Box 1 on page 812:

A challenge in mouse genome-wide association studies (GWASs) is the complex genetic relationships between strains included in the study. Some of these differences stem from the distinct ancestral origins of the mice, such as the differences between wild-derived strains and classical inbred strains, which are primarily descended from domesticated mice(10.1038/nature06067),(10.1038/ng2087),(10.1038/ng.847). Additionally, among strains, there is variability in the degree to which particular genomic regions are shared owing to the complex breeding history. Traditional association statistical tests make the assumption that the phenotypes of individuals in an association are independent. However, owing to the complex genetic relationships, this assumption is violated for mouse GWASs. Closely related strains will have more similar phenotype values than more distant strains. This phenomenon, which is termed population structure, causes spurious associations in GWASs. Recently, statistical methods have been developed to address this problem, including efficient mixed-model association (EMMA)(18385116) and resample model averaging (RMA)(10.1534/genetics.109.100727), which are widely used in mouse GWASs, and EIGENSTRAT(10.1038/ng1847) and EMMAX(20208533), which are widely used in human studies. The figure demonstrates this problem for mouse GWASs. Panel a shows body-weight data for 38 inbred strains from the Mouse Phenome Database as analysed in Kang et al., (2008) (18385116). A phylogeny of the strains is shown, demonstrating a clear genetic distinction between the wild-derived strains and the classical inbred strains. Note that all wild-derived strains have a lower body weight than classical inbred strains. Panel b shows a Manhattan plot with the association results for 140,000 SNPs(20439770) and body weight. Almost every locus appears to be associated with body weight as each of the many SNPs that differentiate the wild-derived and classical inbred strains appears to be associated with body weight. A visualization of the cause of the spurious associations is shown panel c. Many SNPs and the phenotype are both correlated with the genetic relatedness or population structure among the strains. Statistical techniques can take into account the genetic relationships between the strains to correct for population structure, thus minimizing spurious associations. In this example, EMMA was applied to the data (panel d). The highest peak, although not genome-wide significant, occurs on chromosome 8 and is near the logarithm of the odds (lod) peak of a previously known body weight quantitative trait locus Bwq3(11515095). Panels b and d are reproduced, with permission, from Kang et al., (2008) (18385116) © (2008) Genetics Society of America.