Home > Events > 2015 Seminars & Colloquia > JINGYI (JESSICA) LI - University of California, Los Angeles

JINGYI (JESSICA) LI - University of California, Los Angeles

Main Content

Measures of Complex Data
22 January 2015 from 4:00 PM to 5:00 PM
201 Thomas Bldg.
Add event to calendar

In the era of big data, new statistical measures are needed for describing various relationships embedded in complex data structures. In this talk, I will introduce three new statistical measures motivated by the burgeoning availability of high-throughput genomic data. These measures can serve as useful tools for uncovering hidden information and answering biological questions from large-scale genomic data.

The first measure, which is used in an unpublished method called “jSLIDE”, was motivated by the availability of multiple replicate next-generation RNA sequencing (RNA-seq) data of different quality. It answers an important question:  how to measure the quality of each replicate? With this measure, we can pool replicate data more effectively to strengthen signals and reduce noise.

The second measure—“Transcriptome Overlap Measure” (TROM)—was developed to answer a biological question: can we find any similarity between the developmental stages of two model organisms, D. melanogaster (fly) and C. elegans (worm), which are vastly distant in evolution? TROM measures the similarity of developmental stages in terms of the overlap of “stage-associated genes”, i.e., the genes that capture specific transcriptional activities in a stage. In our published work, TROM revealed unknown conservation in the developmental programs of fly and worm, providing new insights into their developmental biology.

The third measure is an association measure called “new R2”, which aims to identify low-complexity non-functional relationships between pairwise variables. This measure is based on a generalized definition of conditional expectation and can be regarded as an extension of the classic coefficient of determination. We propose an estimator of this new measure via a combination of local regression and clustering techniques. The effectiveness of this estimator will be demonstrated in identifying different types of non-functional relationships that may be missed by other measures. The new R2 will be a useful tool for screening pairwise variables, e.g., gene-gene interactions, in genomic research.

Navigation for this Section