Addressing the Digital Divide in Contemporary Biology: Lessons from Teaching UNIX

Serghei Mangul and Lana Martin, together with Alexander Hoffmann, Matteo Pellegrini, and Eleazar Eskin, recently published a paper describing a workshop model for training scientists, who have no computer science background, to use UNIX. Our paper is available online as a preprint and will appear in an upcoming “Scientific Life” section of Trends in Biotechnology.

Scientists who are not trained in computer science face an enormous challenge analyzing high-throughput data. Serghei developed a series of workshops in response to growing demand for life and medical science researchers to analyze their own data using the command line.

Administered by UCLA’s Institute for Quantitative and Computational Biosciences (QCBio), these workshops are designed to help life and medical science researchers use applications that lack a graphical interface. Our paper presents a training model for these workshops—a flexible approach that can be implemented at any institution to teach use of command-line tools when the learner has little to no prior knowledge of UNIX.

QCBio currently offers similar workshops to the UCLA community. In tandem with this publication, we created an online catalogue of resources and papers aimed to provide first-time learners with basic knowledge of command line: https://smangul1.github.io/command-line-teaching/.

We encourage fellow instructors of Bioinformatics, as well as scientists who are new learners of the command line, to read our paper and share their thoughts! Email us at: lana [dot] martin [at] ucla [dot] edu.

 

The full citation of our paper:
Mangul, Serghei, Martin, Lana S., Hoffmann, Alexander, Pellegrini, Matteo, and Eskin, Eleazar. Addressing the Digital Divide in Contemporary Biology: Lessons from Teaching UNIX. Trends in Biotechnology; doi: 10.1016/j.tibtech.2017.06.007.

Advance preprint copies of our paper may be downloaded here: http://www.cell.com/trends/biotechnology/fulltext/S0167-7799(17)30156-7

UCLA Bioinformatics: The Philosophy of the Training Environment and Programs

(This post is jointly authored with Alexander Hoffmann, Hilary Coller, Matteo Pellegrini, and Nelson Freimer.)

UCLA has a rich training environment for Bioinformatics that extends beyond the core academic programs.  For structured academic learning, UCLA offers an Undergraduate Bioinformatics Minor and a Bioinformatics Ph.D. Program.  In addition, UCLA coordinates multiple training programs, several of which are open to researchers from other institutions who are at all stages of their careers.  Many of these programs are either hosted or jointly sponsored by the Institute for Quantitative and Computational Biology (QCB) at UCLA, which is directed by Alexander Hoffmann (UCLA).

Over the past 10 years, driven by the ubiquity of genomics throughout the field, biology has become a data science. Every biomedical research institution has been challenged with supporting the analysis of genomic data generated by groups who traditionally have not cultivated substantial computational expertise. Many of our peer institutions delegate genomic data analyses to a specific Bioinformatics core group that operates on a “fee-for-service” model.

The Bioinformatics core “fee-for-service” model poses many problems.  First, complex issues that arise during analysis of genomic data are difficult to predict in advance.  Projects often require much more effort than anticipated by research groups, leading core groups to struggle with insufficient funds to cover the actual time spent on analysis.  Second, research groups utilizing the core often want to move the project in different directions than what was originally proposed.  In the long term, exploring additional aspects of data can be inefficient when data analysis is delegated to a core group on an as-needed basis.

At UCLA we follow a different approach.  We believe that research groups should receive the training and resources to analyze the genomic data that they generate.  This “training and collaboration” model is the best solution for efficiently completing projects and advancing skills in a research group.  Over the past ten years, UCLA has significantly invested in this training and collaboration model.  For example, UCLA’s Bioinformatics programs are explicitly organized to connect research groups with core groups across campus and provide infrastructure and training to students, faculty, and staff working in many different fields.

Bioinformatics training programs held at UCLA include:

    1. The Collaboratory. The Collaboratory of postdoctoral fellows, directed by Matteo Pellegrini (UCLA), provides an experimental and empirical research environment for bioscientists and computational scientists to collaboratively design and conduct experiments. Most bioscience laboratories have limited capabilities in large-scale data analysis. The Collaboratory’s main mission is to advance genomic data analysis by connecting UCLA bioscience faculty with QCB faculty and fellows.  The Collaboratory fellows are a select group of postdocs funded by the Collaboratory to engage in collaborative projects that leverage their specific expertise.

      The Collaboratory fellows are also responsible for organizing intensive tutorials designed to train UCLA students and postdocs in the latest next-generation sequence analysis techniques. In addition to providing computational expertise to bioscience researchers at UCLA, the Collaboratory also sets up and maintains a next-generation sequence data analysis server, and participants develop methodologies to process new types of data. The Collaboratory has a year-round schedule of workshops open to the Bioinformatics community.

 

    1. Bruins in Genomics Undergraduate Summer Research Program (B.I.G. Summer). B.I.G. Summer is an integrated undergraduate training and research program in genomics and bioinformatics at UCLA. Participants gain an intensive, practical experience in integrating quantitative and biological knowledge while learning how to pursue graduate degrees in the biological, biomedical or health sciences.  The program begins with two weeks of hands-on tutorial workshops that cover fundamental concepts in genomics critical to participation in today’s research.  The remaining weeks are focused on research.  Students work in pairs under the supervision of UCLA faculty mentors and QCB postdoctoral fellows.

      B.I.G. Summer offers unique opportunities that are often not available to undergraduates, including next generation sequencing analysis workshops, weekly science talks by senior researchers, a weekly journal club, professional development seminars, social activities, concluding poster sessions, and a GRE test prep course.  In addition, a special NIH-funded curriculum in neurogenomics, directed by Nelson Freimer and Eleazar Eskin, provides B.I.G. Summer participants with an intensive exposure to this rapidly growing field, in which UCLA is among the leading centers worldwide. B.I.G. Summer is organized by Alexander Hoffmann, Hilary Coller, Tracy Johnson, and Eleazar Eskin. This year, B.I.G. Summer is held from June 19th to August 11th, 2017.  The B.I.G. Summer Program is sponsored by the following generous institutions:

      UCOP for a UC-HBCU partnership Program in Genomics and Systems
      NIH NIBIB for NGS Data Analysis Skills for the Biosciences Pipeline R25EB022364
      NIH NIMH for Undergraduate Research Experience in Neuropsychiatric Genomics R25MH109172

 

    1. Undergraduate and MS Research Program. One of the best ways for faculty to provide training to undergraduate and graduate students is through mentorship in research labs. A substantial challenge to this approach is the increasing number of undergraduate students who want to get involved in research.  For example, there are many more Computer Science majors interested in research than can be absorbed by the number of faculty presently in the Department of Computer Science.  In order to meet rising undergraduate demand for research opportunities, we created an Undergraduate and Master’s student research program.

      This program connects researchers across campus with interested students from a variety of majors.  In doing so, we leverage UCLA’s strength in Bioinformatics to offer a greater number of research opportunities available to undergraduates with and outside of the Department of Computer Science.  Each research opportunity posted on the webpage has a list of requirements, ranging from “one course in Bioinformatics or programming” to “a full year of coursework in programming.”  For students who have completed relevant coursework or are planning their academic schedule, this program provides a clearly defined path to become involved in research projects on campus.

 

    1. Informatics Center for Neurogenetics and Neurogenomics (ICNN). As with other areas of biomedical science, the post-genome era raises the prospect of transformational advances in neuroscience research. However, neuroscience faces special challenges in analysis, interpretation, and management of the vast quantities of information generated by genetic and genomic technologies. The phenotypic and organizational complexity of the nervous system calls for distinct analytical and informatics strategies and expertise.

      The ICNN, directed by Nelson Freimer and Giovanni Coppola, provides advanced analysis and informatics support to a highly interactive group of neuroscientists at UCLA who conduct basic, clinical, and translational research.  Generally, today’s lack of corresponding resources in analysis and informatics constitutes a bottleneck in their research; ICNN provides for these investigators access to excellent facilities for genetics and genomics experimentation.  ICNN faculty are experts in statistical genetics, gene expression analysis, and bioinformatics, and they oversee the activities of highly-trained staff members in  accomplishing three goals: (1) Providing expert consultation and analyses for neurogenetics and neurogenomics projects;  (2) Developing and maintaining a shared computing resource that is incorporated within the large campus-wide computational cluster for computation-intensive analyses, web-servers, and state of the art software tools for a wide range of applications (including user-friendly versions of public databases, as well as workstations on which ICNN users will be trained to employ these tools); (3) Providing hands-on training in analysis and informatics to group users.

 

  1. Computational Genomics Summer Institute (CGSI). In 2015, Profs. Eleazar Eskin (UCLA), Eran Halperin (UCLA), John Novembre (The University of Chicago), and Ben Raphael (Princeton University) created CGSI. A collaboration with the Institute for Pure and Applied Mathematics (IPAM), led by Russ Caflisch, CGSI is developing a flexible program for improving education and enhancing collaboration in Bioinformatics research. The goal of this summer research program is to bring together mathematical and computational scientists, sequencing technology developers in both industry and academia, and the biologists who use the instruments for particular research applications.

    CGSI is a unique opportunity for junior and senior scholars in Bioinformatics to foster collaborative relationships, accelerate problem-solving, and unleash the full potential of their projects.  The program facilitates interdisciplinary collaboration and training with a mix of formal and informal events. For example, senior scholars present traditional research talks and tutorials, while junior scholars present mini-presentations and organize journal clubs.  CGSI fosters interactions over an extended period of time and is laying crucial groundwork to advance the mathematical foundations of this exciting field.  This year, CGSI will be held from July 6th-26th, 2017. CGSI is made possible by National Institutes of Health grant GM112625.

 

“Give a Man a Fish, and You Feed Him for a Day. Teach a Man to Fish, and You Feed Him for a Lifetime.”

UCLA Bioinformatics: The Philosophy of the Ph.D. Program

UCLA | Bioinformatics

(This post is a collaboration between the instructors of the core courses for the UCLA Bioinformatics programs: Eleazar Eskin, Chris Lee, Wei Wang, Bogdan Pasaniuc, Jason Ernst, Sriram Sankararaman, and Jessica Li, along with the current director of the program, Yi Xing.)

Bioinformatics is an interdisciplinary field that combines different aspects of quantitative sciences, such as Computer Science, Statistics, and Mathematics, with biological sciences, such as Molecular Biology and Genetics.  Training programs in quantitative sciences and biomedical sciences have very different cultures and structures, particularly at the doctoral level.  At UCLA, we aim to combine the best of both worlds with the Interdepartmental Bioinformatics Ph.D. program.

We established our Ph.D. program in 2008, and we enroll 6 to 10 Ph.D. students each year. Over 45 faculty specializing in computational and experimental biology are associated with the Bioinformatics Ph.D. program, with active research and education programs spanning biology, mathematics, engineering, and medicine. The program encompasses the breadth of the growing Bioinformatics field by offering courses from over 12 departments.  The Bioinformatics Ph.D. is not housed in any one department but is an Interdepartmental Program (IDP) whose faculty are members of 17 UCLA departments.  The IDP is an administrative unit designed for multidisciplinary academic programs.   This unit also administers the Biomedical Informatics Ph.D. program and will administer the planned Ph.D. program in Systems Biology.

For many aspects of the UCLA Bioinformatics Ph.D. program, we draw upon different ideas from the cultures of Quantitative and Biomedical training programs.

In traditional Biomedical science Ph.D. programs, the majority of a student’s training in applied sciences takes place through mentorship in the laboratory.  Students do take some courses during their first year, but these courses mainly cover recent research in the field and are often team-taught by multiple faculty.  These courses typically require only minimal work outside of class.  During the first year of the Ph.D. program, these students focus on identifying a research lab to join by completing rotations in three labs.  Starting with their second year, students become members of their chosen lab and perform research full time.

On the other hand, in traditional quantitative science Ph.D. programs, the majority of a student’s training takes place didactically through challenging coursework.  In these programs, coursework consumes at least 50% of the student’s time during their first two years.  These intensive courses are usually taught by a single instructor (or sometimes a team of two) and require substantial homework assignments, course projects, and exams.  However, the courses lay a foundation for the technical skills that will become the basis of a student’s future research.  Students admitted to these types of programs are encouraged to join the research lab of a specific professor and start research right away.

Here we describe how we combine these two cultures with the principles and philosophy that guided the design of our Ph.D. Program.

  1. Training in Methodology Development. The UCLA Bioinformatics program is uniquely focused on preparing our students to develop novel methodologies that can contribute to important biological problems.  Students who are interested in methodology development are a great fit for our program.  Our program is able to maintain this focus, because UCLA hosts many other Ph.D. programs that can accommodate students interested in Bioinformatics but prefer a program with a different, sometimes more traditional, focus. These include the recently established Genetics and Genomics Ph.D. program, which has focuses less on methodology development and prioritizes biological discovery.

    UCLA also has a broad set of other Ph.D. programs in quantitative sciences, such as Statistics and Computer Science, which also accommodate students who are interested in Ph.D. research in Bioinformatics  but are primarily interested in a quantitative sciences training program.  UCLA also offers Ph.D.  programs in Biomathematics, Biomedical Informatics and Biostatistics for students interested in other areas of Computational Biology.  In addition, a new graduate program in Systems Biology is being developed in conjunction with the Bioinformatics IDP.  The multitude of programs at UCLA enable students to join a program with similar goals in terms of their training which in turn allows the programs to be organized around these goals.

  2. Our Core Curriculum Provides Rigorous Computational Training. Our core courses are structured in the style of a quantitative Ph.D. program, complete with rigorous training requirements that are met through homework assignments, exams, and course projects. The philosophy behind our courses is to teach fundamental concepts in computation and use Bioinformatics to explore these concepts.

    For this reason, our core courses are rigorous enough to satisfy course requirements in quantitative Ph.D. programs at UCLA, including those for the Computer Science and Statistics graduate programs.  Bioinformatics core courses are taught and administered by faculty who have appointments in these quantitative departments.  Six of the courses are administered by the Computer Science Department, and one by the Statistics Department. Our rigorous core curriculum appeals to students in these programs as well as students in the Bioinformatics Ph.D. program. In fact, the majority of students enrolled our core courses are from quantitative graduate programs. This diversity of academic disciplines brings to these courses a high level of engagement and creativity.

  3. Substantial Didactic Training in Bioinformatics. Similar to a traditional quantitative sciences training program, our program offers a full load of Bioinformatics Courses. Our program includes five core courses that we strongly recommended students take during their first year.  These courses are: Introduction to Bioinformatics (Chris Lee), Algorithms in Bioinformatics (Eleazar Eskin), Methods in Computational Genomics (Jason Ernst and Bogdan Pasaniuc), Statistical Methods in Bioinformatics (Jessica Li), and Computational Genetics (Eleazar Eskin).

    In addition, students are encouraged to take during their second year Machine Learning in Bioinformatics (Sriram Sankararaman) as well as the multiple offerings of Current Topics in Bioinformatics (rotating faculty).  The Current Topics courses cover relevant issues such as Data Mining in Bioinformatics or Advanced Computational Genetics.  We designed the coursework for the UCLA Bioinformatics Ph.D. program so that students can take many skills-building courses comparable to those offered by a traditional quantitative science program.

  4. Rotation Program. Upon entering a Ph.D. program, students typically do not yet know whose lab they want to join. For this reason, we adopt a rotation program styled after typical Biomedical training programs.  Here, students undertake three 10-week rotations; one rotation during each of the three academic quarters of their first year.  Students use a rotation to try out a lab, and decide on a lab to join by the end of their first year in graduate school. Secondary, but important goals of the Rotation Program, are to develop diverse research skills, and to develop a collaborative network that may benefit the doctoral research project and career development.

  5. Seminar Program. An important aspect of Biomedical training programs is the informal training provided during seminars and journal clubs. The UCLA Bioinformatics Ph.D. program leverages informal training with a seminar that students are required to attend for the first two years of the program.  In fact, the weekly Bioinformatics Seminar series has become a key focal point of the UCLA Bioinformatics community.  Students also organize an annual overnight retreat where they share and get feedback on their research.

  6. Research Oriented Written Qualifier. Every Ph.D. program requires completion of a written qualifying exam, which typically occurs after coursework is completed. In traditional biomedical science programs, this exam is often preparation of a grant proposal in a topic of the student’s choice.  In traditional quantitative science Ph.D. programs, this requirement is often a challenging written exam covering topics in coursework. More recently, quantitative Ph.D. programs have abandoned the written qualifier and replaced it with an exam where the students write a paper demonstrating their research skills.

    In the UCLA Bioinformatics Ph.D. program, we have adopted such an exam.  After completion of first year courses, and faculty approval of their project proposal, students are given a one-month period to work independently on the project and to submit a written research paper reporting their results. Faculty in the program review the resulting papers. Although these projects are often small in scope because of the exam’s time constraints, the resulting papers are required to exhibit: 1) high quality in writing, 2) contextualizing the project within existing research, 3) supporting conclusions with chosen experiments, and 4) logical flow of the arguments in the paper.  The idea behind the exam is not to weed out students who cannot pass it, but to set an objective bar for achievement that the students can attain.

Just as Bioinformatics is an interdisciplinary field that combines methods, data, and theories from different academic traditions, the UCLA Bioinformatics Ph.D. is earned in an interdisciplinary program that combines aspects of the training cultures of quantitative and biological sciences. Our unit is a new kind of program that has been specifically designed to administer a rigorous, cross-sectional training in methodology development.