Home > Events > 2013 Seminars & Colloquia > ROGER W. HOERL - Union College

ROGER W. HOERL - Union College

Main Content

” Big Data: A Challenge for Statistical Leadership”
21 November 2013 from 4:00 PM to 5:00 PM
201 Thomas Bldg.
Add event to calendar

The Wall Street Journal, New York Times and other respected publications have had major features recently on Big Data - the massive data sets which are becoming commonplace, and on the new, "sexy" data mining methods developed to analyze them. These articles, as well as much of the professional data mining and Big Data literature, may give casual users the impression that if one has a powerful enough algorithm and a lot of data, good models and good results are guaranteed at the push of a button. Obviously, this is not the case. The leadership challenge to the statistical profession is to insure that Big Data projects are built upon a sound foundation of good modeling, and not upon the sandy foundation of hype and unstated assumptions. Further, we need to accomplish this without giving the impression that we are "against" Big Data or newer methods. I feel that the principles of statistical engineering (see Anderson-Cook and Lu 2012) can provide a path to do just this. Three statistical engineering principles that are often overlooked or underemphasized by Big Data enthusiasts are the importance of data quality - knowing the "pedigree" of the data; the need to view statistical studies as part of the sequential process of scientific discovery - versus the "one-shot study" so common in textbooks; and the criticality of using subject-matter knowledge when developing models. I will present examples of the severe problems that can arise in Big Data studies when these principles are not understood or ignored. In summary, I argue that the development of Big Data analytics provides significant opportunities to the profession, but at the same time requires a more proactive role from us, if we are to provide true leadership in the Big Data phenomenon.