Marzia A Cremona, Penn State University
Functional Data Analysis (FDA) can be broadly and effectively employed to exploit the heterogeneous, high-dimensional and complex “Omics” data generated by Next Generation Sequencing technologies. The key idea is to consider “Omics” data at high resolution, treating them as “curves” of measurements along the DNA sequence.
I will demonstrate the effectiveness of FDA in this setting with two applications. In the first one, I will show how ITP, a recently developed functional inferential procedure, its extension IWT and multiple functional logistic regression can be used to elucidate the effects of genomic landscape features on integration and fixation of endogenous retroviruses. In the second one, I will present the Functional Motif Discovery, a novel algorithm which identifies functional motifs in a set of curves (i.e. similar curve pieces that may recur and be shared by the curves). The core of the algorithm is a probabilistic K-mean with local alignment, and it is employed to explore high-resolution mutation rate profiles in regions of the human genome where these rates are globally elevated.