HAO WU - Emory University

" Modeling intensity data from ABI/SOLiD second-generation sequencing”
27 October 2011 from 4:00 PM to 5:00 PM
111 Tyson Bldg.
Capable of sequencing millions of short DNA fragments in parallel, second-generation sequencing technologies have rapidly revolutionized genomic research. The applications of this technology range from genotyping and structural variation analysis in whole genome studies to transcriptome quantification and reconstruction. Among several available platforms, SOLiD
system from Applied Biosystems Inc. (ABI) provides an unique approach to translate a pair of adjacent nucleotides into one of the four colors during sequencing. The colors reported from the SOLiD system (color-calls) are results of a complicated statistical manipulation of noisy fluorescence intensity measurements, which introduces systematic biases that may mislead
downstream analysis.
In this talk I will first present a simple intensity pre-processing method for correcting these biases. A version of quantile normalization was developed which substantially improves yield and accuracy of calls at a small computational cost. In the second part of the talk, I will present a model based quality assessment of the color reads. A simple linear model was applied to the
intensity measurements to capture the uncertainty arising in the base calling procedures. Compared to the factory provided quality scores, the results from our model provide more insightful read qualities.

