Alessandro Rinaldo,Carnegie Mellon University
Several novel methods have been recently proposed for performing valid inference after model selection. In this talk we will revisit sample splitting, the oldest and simplest of such methods, in which part of the data is used for model selection and the other for inference. We show that sample splitting leads to a simple, valid, flexible nonparametric approach to inference in regression problems under minimal assumptions. We establish results on the accuracy of sample splitting in high dimensions. We propose a new class of parameters that measure variable importance and demonstrate that they can be inferred with greater accuracy than the usual regression coefficients. We show that there is an inference-prediction tradeoff: splitting increases the accuracy of the inference but can decrease the accuracy of the predictions.
Joint work with Larry Wasserman, Max G’Sell, Jing Lei and Ryan Tibshirani