Cedric Neumann, South Dakota State University
Many modern problems in statistics involves the comparison of multiple objects to determine ifthey are “the same”. In the forensic context, scientists are interested in finding out if traces (some recovered at crime scenes and some collected under controlled conditions) originate from the same object or individual. In other areas, such as chemistry, biology, or computer vision, “the same” may take alternative meanings; however, these fields rely on similar inference processes based on multiple objects with very high dimension measurements.
In order to develop and study an inference framework when multiple objects characterized by high dimension measurements are considered, some dimension reduction is necessary. A common technique, popular in machine learning, consists in using kernels to measure the level of similarity between pairs of objects, expressed as inner products or “scores”. Unfortunately, the use of kernels creates an inherent dependency between the scores resulting from the comparison of multiple pairs of objects. Thus, these scores cannot be directly used as a test statistic.
During this talk, we will describe a method that proposes to use a vector of scores as a test statistic and we will develop a model for the distribution of such a statistic. The proposed model has a limited number of assumptions and parameters, and has a closed-form solution. We will present some results on the appropriateness and robustness of the assumptions. We will also exemplify the use of the method in the forensic context, using a dataset of Very Small Particles (VSP) of dust found on carpet fibers. Finally, we will discuss the use of the model in a Bayesian inference framework, and its extension to different problems commonly encountered in forensic science.