Home > Events > 2011 Seminars & Colloquia > Haiyan Wang, Kansas State University

Haiyan Wang, Kansas State University

Main Content

"Nonparametric Variable Selection in High Dimensional Data for Classification"
09 August 2011 from 3:00 PM to 5:00 PM
216 Thomas Building
Add event to calendar

Haiyan Wang

The selection of relevant genes for classification of phenotypes for diseases with gene expression data have been extensively studied. Previously, most relevant gene selection was conducted on individual gene with limited sample size. Modern technology makes it possible to obtain microarray data with higher resolution of the chromosomes. Considering gene sets on an entire block of a chromosome rather than individual gene could help to reveal important connection of relevant genes with the disease phenotypes. In this talk I will present two methods for feature selection while taking into account of the possible interactions among probe sets in classification of several cancers. The first method applies to densely observed genomic data where blocks of probe sets are considered as the unit of operation. A multiple-comparison procedure particularly suited to the data structure was proposed to identify truly relevant probe sets while preserving the neighborhood location information of the probe sets. The second method applies to high dimensional data where neighborhood information is not available. A support vector machine based multi-step screening procedure along with hypothesis testing will be presented to find the most parsimonious set of features for high classification accuracy. Application of the methods in several classic data sets as well as two new large data sets each containing more than 50,000 features achieved excellent leave-one-out classification accuracy.

Filed under: