# Philip Ernst - University of Pennsylvania, Wharton Statistics

## Main Content

We consider a class of statistical models generalizing the very useful classical capture-recapture model. The classical model involves two independent surveys, and hence (usually) two “collectors”. Our generalizations allow for an arbitrary number of collectors, and generalize the classical model also in some other respects. Here is a simplified way to understand the basic general problem: Consider i-tunes listeners and distinct i-tunes songs. An i-tunes collection contains *k *tunes. *k* is the unknown parameter. There are *m* listeners (“collectors”). The tunes are played on an i-shuffle player. Each time the i-shuffle player plays a tune it chooses with equal probability one tune from amongst the tunes that the user has not heard. The listeners compare with each other to know which tunes each has heard in common. The goal is to estimate *k*.

The problem is precisely formulated; minimal sufficient statistics are described; and the maximum likelihood estimator is derived and its existence and uniqueness are established. Asymptotic properties of this MLE are then studied in three separate asymptotic regimes. For one of these it turns out, rather surprisingly, that the MLE is not asymptotically efficient, and can be improved. In terms of statistical theory this talk further develops “modern" asymptotic theory in settings in which 1) the dimension of the sample space grows with the number of observations and 2) the data and parameter space are each discrete. A variant of the Cramer-Rao inequality is derived for such settings, and is used in our analysis of the multiple collector problem.