Biological Data Analysis: The Right Way
“Biological Data Analysis: The Right Way” was a five day boot camp sponsored by the NIH funded Computation, Bioinformatics and Statistics (CBIOS) Predoctoral Training Program held from Jun 6-10, 2016. The boot camp was open to post-candidacy graduate students, postdoctoral researchers, staff and faculty members. A familiarity with topics discussed in Applied Bioinformatics (BMMB 852), Statistical Analysis of Genomics Data (STAT 555) and Foundations in Data Driven Life Sciences (MCIBS 554) is desirable. The participants are required to bring their laptops (Mac preferred) with a list of applications and programs installed (to be provided). The class-size was limited to 25-30 participants. The application for participation in the boot camp was in April.
There are growing concerns about the ability of our scientific community to generate rigorous, transparently documented research that can be meaningfully reproduced and validated by different laboratories. The goal of the boot camp was to train researchers towards fuller appreciation of the issues contributing to ‘reproducibility crisis’, a greater familiarity with computational and statistical techniques, exposure to innovative tools specifically devised to improve transparency and reproducibility.
The boot camp had three modules, each coordinated by a CBIOS training faculty member, a series of talks, and ‘hands on’ exercises. The first module, Computational Statistics and Simulation, instructed by Dr. Altman will teach simulation-based statistical methods critical for rigorous and reproducible research. The participants will learn key concepts for comparing statistical methods, design realistic simulation studies that mimic important features of complex data sets they will encounter in their research as an aid for discovering systematic signals and patterns.
The second module, Python/Software Carpentry, instructed by Dr. Albert focused on computing and programming aspects of analyzing ‘omic’ data. Most scientists are never taught how to build, use, validate and share software well. As a result many will expend their efforts by doing things inefficiently. In this module, the participants were introduced to the principles of efficient data analysis, computer programming and software engineering demonstrated via the Python programming language. The goal of the module is to help participants spend less time wrestling with software and more time doing useful research.
The third module, A Computational Platform for Transparent and Reproducible Research, instructed by Dr. Nekrutenko, trainned participants in the use of Galaxy, an innovative data integration and analysis platform for biomedical research. This module provided participants with core bioinformatics skills, and experience with a widely deployed platform designed to maximize transparency and reproducibility of research.
Additionally there were talks by Drs. James Broach, Ross Hardison, Qunhua Li, George Perry and others on issues pertaining to reproducible research in diverse biological contexts.