Averages Don’t Matter...and Other Common Mistakes in Data Analysis

Educators receive formal training in many things: classroom management, curriculum design, technology, diversity and special needs, lesson planning, and more. Few colleges of education provide training in assessment, and rarely does an aspiring educator receive any formal guidance in data analysis. It’s no wonder that our field lacks comfort in how to analyze learning data. While much could be written about the art of analysis, what follows is advice from the data trenches--five lessons learned through practical experience.

1. If the data surprise you, check the data.

There is a tendency when doing data work to seek surprises. Data that confirm what we already know are underwhelming, but unearthing insights that challenge conventional thinking is truly gratifying. We actively search for unanticipated trends, notable differences, and noteworthy aberrations in our data.

The problem is that the longer you utilize data (particularly the same kinds of data), the fewer surprises you find in the numbers. A counterintuitive finding may be important (“prior teaching experience does not necessarily translate to higher outcomes for students”) or spurious (“listening to Mozart makes you smarter”). Approach surprises in your data with an appropriate level of critical discernment. If the data surprise you, check the data. Be sure that the significant insight you find is not a result of a bad query, poor matching, faulty assumptions, or analytical mistakes.

2. Don’t manage by the means.

Most professionals, including educators, are enchanted by managing through means. Anytime we see data--attendance, behavior, growth, proficiency, or diagnostic--we ask, “What’s the average?” A mean can be a helpful measure, but it alone cannot tell the entire story of the data. Means provide an estimate for the middle of a range of data; however, a mean masks the variance within those data. As Lawrence Dworsky wrote, “The average of an elephant and a mouse is a cow, but you won’t learn much about either elephants or mice by studying cows.”

Remember that each data point represents something about a specific student. While we use statistics to summarize data, we educate individual students--not the average of students.

Variation in learning data is often more important that the mean. Educators should identify how much variation would change how or what they teach students. You may use the results of your diagnostic measures to determine which students have needs that vary from on-grade-level instruction, or analyze the variation in state test performance by bucketing students into performance categories that tie to key actions (like summer school, intervention, acceleration, etc.). It is more important to identify the 40 eighth graders who are drop-out risks based on their reading proficiency than to calculate the mean proficiency for the grade and plot it on a five-year trend line.

3. Know your models.

This equation doesn’t look like it has much to do with your 401k, does it?

And yet, it did. This function was created by Dr. David Li to solve the thorny problem for how to estimate default risk for diverse pools of debt (bonds, bank loans, mortgages, etc.). Li’s model was widely adopted by the financial sector because it took something very complicated (pricing risk of diverse forms of debt), and made it seemingly simple. What was lost in the enthusiasm for Li’s approach were the assumptions he employed about how markets work when building the model. Wall Street traders employed the model with abandon, though few understood it. When its assumptions were violated, the model failed and revealed the tremendously bad investments that were made based on its guidance. Li’s model became the formula that “killed” Wall Street, an important contributing factor to the 2008 financial crisis.

The moral of the story for educators is simple: know your models. Our field is moving into a more sophisticated quantitative era, with advanced data models influencing core activities like school and teacher evaluation, prediction of student outcomes, program evaluation and personalized learning. We must become more sophisticated consumers of data models. If your district uses data to make important decisions (such as evaluating staff), you must possess a basic understanding about how the data model is constructed, and the assumptions it makes about teaching and learning. Otherwise, you’ll lack the context to use the data effectively, just as the Wall Street traders did.

4. All analysis is comparative analysis.

Edward Tufte, master of information design, once wrote, “The deep, fundamental question in statistical analysis is Compared with what?” Meaning in quantitative analysis is created through comparisons because comparison are a natural mechanism for making judgments and casual inferences. Finding that 65% of your students met a growth target means much more if you know the national average is 45%, just as how much time a student spends in personalized learning can be contrasted with learning gains. There are many forms of comparison available: criteria, norms, pre-test/post-test, treatment/control, and data models. By making quality comparisons, your analysis will offer context to its results and clarity in its implications.

5. Data work is like an iceberg.

Data work is like an iceberg: what people end up seeing is a small part of the total analysis. This has two implications. First, all data work is always selective in what it presents. Data analysis is fraught with false starts, dead ends, and inconclusive results. Because the numbers speak, but softly, it takes a patient hand to develop a complete data story. Second, time is scarce resource for learning analysts. When analyzing learning data, be sure to budget your time for the entire iceberg: complete data preparation, data exploration, analysis, model comparison, and communicating results. My experience is that learning analysts often spend 90% of their time working on the analysis and only 10% in communicating results. Too often good insights are buried in poor presentation.

Generally, it is as much work to craft a poor analysis as it is a good one. Adhere to these lessons and you will save countless headaches in drawing value from your data.