Home > Events > 2011 Seminars & Colloquia > LONGHAI LI - University of Saskatchewan

LONGHAI LI - University of Saskatchewan

Main Content

High-dimensional Classification Using Hierarchical Bayesian Polychotomous Logistic Regression Models

Class prediction and feature selection with high-dimensional data (eg, with thousands of features) arise in many areas of modern sciences. A typical example is that biologists want to use gene expression data (measured by microarray or sequencing technologies) to classify the types of tissues, as well as to find the genes significantly relevant to the classification. In analyzing high-dimensional data, it is important to use two assumptions to achieve good performance: both relevant features (signals) and correlations among features are very sparse. (Polychotomous) logistic regression can consider correlations among features without modeling high-dimensional covariance matrix, and is also robust to non-Gaussian outliers. However, we need to consider shrinking coefficients towards to 0. One way is to use L1-penalized logistic regression (LASSO), which can be interpreted as assigning Laplace distributions as priors for coefficients. However, we've found that the tails of Laplace distribution may not be heavy enough to model the very sparse coefficients in high-dimensional problems. In this paper, we develop a hierarchical Bayesian polychotomous logistic regression method (shortened as HBPLR) that uses t distributions with small degree freedom as priors for coefficients. We train the posterior of HBPLR with efficient Hamiltonian Monte Carlo (HMC), a variant of Metropolis sampling methods. Our tests with simulated data sets and Prostate gene expression data showed that this method can better handle the sparsity of coefficients, and therefore achieve better classification than LASSO logistic regression and DLDA.