Data Science in Education Group - Bay Area Kickoff Meeting

Santa Clara, CA - Nov 17 6:00 pm to 8:00 pm


In the kickoff event for Data Science in Education Group (DEGREE), we are honored to have Prof. Mehran Sahami of Stanford University.


We consider developing statistical models to give us insight into the dynamics of student populations in CS education. In this talk, we consider two studies in this vein. The first involves analyzing the evolution of gender balance in a college computer science program, showing that focusing on percentages of underrepresented groups in the overall population may not always provide an accurate portrayal of the impact of various program changes. We propose a new statistical model based on Fisher’s Noncentral Hypergeometric Distribution that better captures how program changes are impacting the dynamics of gender balance in a population, especially in the case where the overall population is rapidly increasing (as has been the case in CS in recent years).

Second, we present a methodology which uses machine learning techniques to autonomously create a graphical model of how students in an introductory programming course progress through a programming assignment. We subsequently show that this model is predictive of which students will struggle with material presented later in the class. Our eventual goal is to be able better understand students' learning and the conceptual difficulties they may encounter as novice programmers so as to be able to provide better and more personalized guidance to them during their learning process, and ultimately improve education in software engineering.

This talk includes work done jointly with Sarah Evans, Katie Redmond, Chris Piech, Daphne Koller, Steve Cooper, and Paulo Blikstein.


Mehran Sahami is a Professor and Associate Chair for Education in the Computer Science department at Stanford University. He is also the Robert and Ruth Halperin University Fellow in Undergraduate Education at Stanford. He co-founded the ACM Conference on Learning at Scale, which is focused on interdisciplinary research at the intersection of the learning sciences and computer science. Mehran has published over 50 technical papers and has over 20 patent filings on a variety of topics including machine learning, web search, recommendation engines in social networks, and email spam filtering that have been deployed in several commercial applications.