Review Article: Population Structure in Genetic Studies: Confounding Factors and Mixed Models

Bioinformatics is a rapidly growing field comprised of multiple academic disciplines. The work of quantitative geneticists is often not well understood by scholars conducting other types of research in Genetics. In response to this information gap, we are launching a series of reviews that are aimed to make common problems in computational biology research accessible to anyone in Genetics. We hope these reviews help researchers in Genetics better understand the scope and applicability of each other’s work, and serve as study guides for students taking college courses on the subject matter.

Today we made available on bioRxiv the first paper in this series, our review of population structure and relatedness in association studies. A genome-wide association study (GWAS) seeks to identify genetic variants that contribute to the development and progression of a specific disease. Over the past 10 years, new approaches using mixed models have emerged to mitigate the deleterious effects of population structure and relatedness in association studies. However, developing GWAS techniques to effectively test for association while correcting for population structure is a computational and statistical challenge. Our review motivates the problem of population structure in association studies using laboratory mouse strains and how it can cause false positives associations. We then motivate mixed models in the context of unmodeled factors.

To read the full review, download our paper:

This review was written by Lana Martin and Eleazar Eskin. We welcome feedback; please e-mail Lana if you have comments or questions: lana [dot] martin [at] ucla [dot] edu.

Body weight phenotypes of 38 inbred mouse strains from the Mouse Phenome Database generated by The Jackson Laboratory. The distribution of mice body weights shows two clades of mice have very different body weights.

Review Article: The Hybrid Mouse Diversity Panel

This year, we published a review of studies on the Hybrid Mouse Diversity Panel (HMDP) dataset, a project led by Aldons J. Lusis (David Geffen School of Medicine at UCLA). Our paper in Journal of Lipid Research describes the dataset, summarizes current discoveries facilitated by the dataset, and explains how researchers can use correlation, genetic mapping, and statistical modeling methods with HMDP data to address cardiometabolic questions.

The Hybrid Mouse Diversity Panel (HMDP) is a collection of approximately 100 well-characterized inbred strains of mice that can be used to analyze the genetic and environmental factors underlying complex traits. While not nearly as powerful for mapping genetic loci contributing to the traits as human genome-wide association studies, it has some important advantages. First, environmental factors can be controlled. Second, relevant tissues are accessible for global molecular phenotyping. Finally, because inbred strains are renewable, results from separate studies can be integrated.

Since its development in 2010, studies using the HMDP have validated over a dozen novel genes underlying complex traits. High-throughput technologies have been used to examine the genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes of mice subjected to various environmental conditions. These analyses have identified many novel genes and significant loci associated with disease risk relevant to obesity, diabetes, atherosclerosis, osteoporosis, heart failure, immune regulation, and fatty liver disease.

The HMDP has substantial potential to advance interdisciplinary research on genetics and computational biology. In order to make HMDP and associated methods accessible to cardiometabolic researchers, our paper includes a glossary of genetics terms and an outline of how the database can be interrogated to address certain questions using correlation, genetic mapping, and statistical modeling.

All of the published data are available and can be readily used to formulate hypotheses about genes, pathways, and interactions. For more information about HMDP, read our article:

The full citation to our paper is:

Sorry, no publications matched your criteria.



Hypothetical examples of how information from the HMDP can be utilized to explore relationships between genes (A) and traits (B) of interest. Read our paper for more information on methods for exploring their relationships with multiple layers of information.

Review Article: GWAS and Missing Heritability

cacm-coverA couple of years ago I was asked to write a review article on the progress of my field (computational genetics) targeted toward computer scientists. My article “Discovering Genes Involved in Disease and the Mystery of Missing Heritability” was just published on the cover of the Communications of the ACM. This article is written to be an introduction to the field as well as describe the rapid progress over the past decade in terms of the discovery of large number of variants involved in common human diseases. The article is written assuming no background in biology and is designed to be accessible to researchers and students outside the field. I hope that it will encourage other computational researchers to get involved in genetics.  The journal also made a video highlighting this article which is available here:

Discovering Genes Involved in Disease and the Mystery of Missing Heritability from CACM on Vimeo.

The full citation to the article is:

Sorry, no publications matched your criteria.