Prabhani Kuruppumullage Don - Currently working on her dissertation.
Summary of your research experiences at PSU:
(Advisors: Dr.Bruce Lindsay and Dr.Francesca Chiaromonte)
For my dissertation I am working on developing a bi-clustering approach that uses mixture models and the EM algorithm. Evaluating the mixture-based likelihood in the bi-clustering setting is computationally challenging, and we propose a composite likelihood approximation to overcome these problems.
Bi-clustering is of great utility in the analysis of large data sets from various kinds of genomics applications, text mining studies, etc. It identifies groups of observations and features simultaneously, permuting of rows (observations) and columns (features) of a data matrix according to the corresponding structure (see heatmaps in figure 1).
(a). Original data (b). Rearranged data
Figure 1: Heatmaps of a simulated dataset. Figure 1.a shows the original data, and figure 1.b shows the heatmap of the rearranged data according to the results of the proposed bi-clustering method.
We work on statistical issues such as parameter estimation, labeling, and model selection that arise with the approximate likelihood.
Other research work
In addition to my PhD dissertation I am involved in several other research projects.
(1). Segmenting the human genome based on mutation rates
(Collaborative work with Dr.Francesca Chiaromonte, Dept. of Statistics, and Guru Ananda and Dr.Kateryna Makova, Dept. of Biology)
In this study we fit Multivariate Gaussian Hidden Markov Models (MG-HMMs) to the rates of four mutation types (small insertions, small deletions, substitutions, and microsatellite repeat number alterations) inferred on a broad scale (1-Mb) from human-orangutan divergence across neutrally evolving regions of the human genome (e.g. ancestral repeat (AR) sequences).
We identify six states with various combinations of elevated or depressed mutation rates. Four states capture profiles ranging from “hot” to “cold” nucleotide substitutions, insertions, and deletions along the autosomes, one state captures very “cold” indels and substitutions on chromosome X, and an additional state captures elevated microsatellite mutability (see figure 2).
Figure 2: Heatmap representing mutation profiles of the six states of neutral variation.
The states differ from each other in their prevalence, genomic locations (see figure 3), segment lengths, and associations with genomic landscape features.
Figure 3: Genomic locations of segments belonging to the six states of neutral variation.
We are currently working on a new project, with the aim to characterize the variation of major substitution patterns along the human genome.
(2). Evolution characteristics of ensemble forecasts through clustering
(Collaborative work with Dr.Francesca Chiaromonte, Dept. of Statistics, and Dr. Jenni Evans, Dept. of Meteorology)
In this study, we apply model based point and path clustering methodologies to the European Centre for Medium-Range Weather Forecasts (ECMWF) 51-member ensemble forecasts of Typhoon Sinlaku (2008) with two aims: (i) to investigate the structural evolution of the storm, and (ii) to explore any structure in the variation of the ensemble forecasts that might aid in prediction.
For aim (i), we cluster points representing Typhoon Sinlakus progression in the 3-dimensional CPS space (spanned by lower-tropospheric thermal winds, upper-tropospheric thermal winds, and lower-tropospheric thermal asymmetry), which is used to define the evolving structure of a storm (see figure 4).
Figure 4: Typhoon Sinlaku clusters and intensity-based storm structures
For aim (ii) we cluster entire ensemble forecast paths for Typhoon Sinlaku, both in physical space, i.e. the 2-dimensional space spanned by latitude and longitude, and in the 2-dimensional subspace of the CPS spanned by lower-tropospheric thermal winds and lower-tropospheric thermal asymmetry (see figure 5).
(a). Five clusters of ensemble paths (b). Five estimated mean paths
Figure 5: Clusters for Typhoon Sinlaku's 50 ensemble paths initiated on September 19, 2008 at 00UTC in the physical space
(4). Case based reasoning in comparative effectiveness research
(Collaborative work with Dr.Marianthi Markatou, IBM T.J. Watson Research Center)
We combine the concept of “medical case” with indirect evidence derived by “borrowing strength” from similar patients to enable performance of comparative effectiveness research treatment comparisons for personalization of medical care. Given a specific patient, we construct a statistical algorithm for identifying the number of similar, to the specific individual, subjects to enable CER-type of comparisons, with the aim of informing medical decision-making.
Why you chose Penn State:
When I first applied to grad schools, I didn’t have much idea about how to select a school, what factors I should look into, etc. I talked to few of my lecturers back in Sri Lanka and out of the schools they recommended, I chose to apply to Penn State and one other school.
When I had to make the decision to chose between the two offers I went through the two programs and chose the one I thought is more diverse. Given that Penn State offered much variety of courses and was the more reputed program, choosing Penn State over the other school was not hard.
But after being here for sometime, I realized it was one of the best things that happened to me. If I had go through the process again and choose a school to study, I will select Penn State with much more confidence for so many reasons that I didn’t even think while I was looking for grad schools.
Apart from the obvious reasons like, having a well-balanced PhD program and diverse research opportunities, there are some other reasons why I would chose Penn State all over again if I had to. Statistics department at Penn State is one of the friendliest places I have seen in my life. The students, faculty and staff were really helpful in getting adjusted to the new place in a new country. Weather, surroundings of the small college town and the school spirit are some other reasons that lie on top of the long list of reasons, why I would choose Penn State.