A Simple Model for Undergraduate Research in Statistics

Our statistics majors at St. John Fisher College are required to do a research project as the capstone element of their bachelor’s degree. As much as possible we let the research focus be student-driven so that it reflects the student’s abilities and interests. Over the last several years the research projects have clustered pretty naturally into two types: statistical research and data scientific research.

Statistical Research

Projects of this type involve basic statistical research, and can come in several forms: 

  • Explore and test properties of a statistic under manipulated conditions 
  • Compare the performance of a family of statistics that differ in mathematically subtle but theoretically interesting ways 
  • Develop an adjusted statistic that addresses a theoretical or empirical deficiency and evaluate it in a Monte Carlo study against the original statistic
  • Develop and test a new statistic or statistical method to meet an analytic need in some field or in some type of data

Statistical research draws on several content areas of the undergraduate’s major studies, including the mathematical underpinnings of statistics, theoretical aspects of statistics, and Monte Carlo methods. Research problems of this type should be grounded in past work, and therefore require substantial literature review. Assessments of statistical research projects generally target outcomes such as: problem statement development, literature review quality, mastery of Monte Carlo or other appropriate simulation methods, general coding skill, and quality of the final paper and/or presentation.

From our experience, students who are a good fit for basic statistical research:

  • like the mathematical/theoretical aspects of statistics
  • have less well-developed disciplinary interests
  • are interested in deductive/confirmatory/experimental work
  • are comfortable with the “tidiness” of data created through simulation methods
Sample R Environment

Data Scientific Research

Projects of this type involve applying statistical methods to disciplinary questions using disciplinary data, and can come in several forms:

  • Data wrangling, including scraping websites for data
  • Statistical learning or modeling
  • Data mining/visualization 

Data scientific research draws on several content areas of undergraduate statistics, including database operations, data wrangling, statistics and models for prediction and classification, literacy in disciplinary variables and measures, and graphical data summaries. 

Research of this type requires both an interest in disciplinary questions, and the computing skills to find useful data and render it into an analyzable form. 

Assessments of data scientific research projects focus on outcomes such as: appropriateness and innovativeness of the data source; quality of the data wrangling, visualization, or modeling work; and literacy of the disciplinary concepts relevant to the project.

From our experience, students who are a good fit for data scientific research:

  • have a keen disciplinary interest
  • are comfortable with finding, retrieving, formatting, and cleaning messy data
  • have an interest in modeling
  • are good problem solvers with R
This simple model for classifying projects has helped our undergraduate program standardize expectations for undergraduate research, create equivalence between basic statistical research and data science projects, match students to project types that fit their skills and interests, guide students’ research proposals to incorporate multiple content areas, and assess undergraduate research. While there are other types of undergraduate research experiences in statistics, our system has undoubtedly been shaped by the types of students who enter our Statistics and Data Science programs and the faculty at St. John Fisher College. For other ideas in developing undergraduate research projects in statistics, I recommend these excellent papers:

Photo of author

Bruce Blaine, Ph.D., PStat® is an applied statistician and Professor of Statistics and Data Sciences in the Department of Mathematics, Computer Science, and Statistics at St. John Fisher College. He currently serves as Vice-Chair of the MCS Division. Dr. Blaine’s research and professional interests include variance heterogeneity, meta-analysis, robust and nonparametric statistical methods, statistical consulting, and statistical computing with R.