Is It Time to Rethink the Traditional Grading System?

Robert Talbert is a math professor, so numbers are his thing. And the way the grading system in education works has long bothered him.

That became clear a few years ago, when a particularly bright student in a calculus class Talbert was teaching bombed the first exam. The student knew the material, but she just wasn’t a good test-taker. Her score on that exam was so low, in fact, that she realized she had no chance to get an A in the course, no matter how well she might do on future tests and assignments. The same thing happened on the second exam, and now the student had no way to do any better in the class than a C.

“Gradually these quizzes and tests and timed assessments and the way it all fits together with points and averaging, it just wore her down,” Talbert says. “Soon she was acting out in class and saying out loud, ‘I don’t see why any of this matters.’”

She ended up dropping out of the class, and Talbert never saw her again.

After that, this professor vowed never to use traditional grades on tests again. But he wasn’t quite sure what to replace them with.

As Talbert soon discovered, there’s a whole world of so-called alternative grading systems. So many, in fact, that he ended up co-writing an entire book about them with a colleague at his university, David Clark. The book, which is due out this summer, is called “Grading for Growth: A Guide to Alternative Grading Practices that Promote Authentic Learning and Student Engagement in Higher Education.”

EdSurge connected with Talbert to hear what he uses in his classes now, and why he argues that reforming how grading works is key to increasing student engagement.

Listen to the episode on Apple Podcasts, Overcast, Spotify, Stitcher or wherever you get your podcasts, or use the player on this page. Or read a partial transcript below, lightly edited for clarity.

EdSurge: Our grading system seems pretty fundamental to the way schools and colleges work, but you note in the book that letter grades haven’t always been in place. What did educators do before the current system?

Robert Talbert: For the first 600 or 700 years of higher education — about the first 70 percent of higher education's lifespan — there was really no such thing as a grade. You would go to a university and you would study for four years, and you'd go to lectures and go to your discussion groups. And then at the end of the four years, you would just have a giant oral exam over everything — very much like the Ph.D. dissertation defenses we have now. In fact, those are holdovers from those days.

The whole idea of having exams per course didn't happen until the 18th century. The first known example of that was at Yale University, I believe it was in the 1780s. And the mark wasn't a letter, it was just descriptive adjectives, like ‘very good,’ ‘not quite so great, but still OK’ and ‘not OK,’ basically. And it still wasn't points. And it still wasn't averaging because you can't average adjectives together. And it just kind of evolved over time.

We did not really arrive at our current conception of points-based A, B, C, D, F style grading until almost the beginning of the 20th century. It's really only about 100 years old.

Why do you think that stuck versus other systems that might have evolved?

It has to do with standardizing the reporting of students’ academic progress. And this really began to take hold in the mid-19th century in America, because at that time you had a lot of immigration and a lot of mobility because of westward expansion. … As families became more mobile in the 19th century, you might have a kid that grows up and immigrates to New York and then moves to Missouri and then moves to California.

And so in that case, you really need a standardized way to say, ‘This student has done excellent work or just good work or average work.’ And that's kind of where grades where the standardized idea of a grade came from.

And another factor that pushed this was the early 20th-century industrial revolution obsession with scientific measurement of everything. This is when we first began to see IQ scores, for example, begin to emerge. It has the appearance of a scientific measurement. And that was good enough for that time.

There's lots of discussion of equity and education these days. And so I was interested to note that you argue in the book that “traditional grading violates any reasonable standard of equity.” What makes you say that?

We're referencing a book there by Joe Feldman called “Grading for Equity.” In that book, Feldman lays out a number of criteria for what might constitute equity. I would boil it down to saying grading rewards assessment-taking and rewards test-taking. Grades are not a measure of intelligence. They're a measure of your ability to take an assessment about something. And so who benefits from this? I mean, who is best situated to take high-pressure tests? Well, it's kids often from highly-resourced educational systems. It's kids who can afford the assistance they need to take these tests. It's typically a particular group of students who are better situated than anybody else to take these assessments and get these grades.

We firmly believe that every student can and should grow. But the way that we set up our grading isn't about growth. It's about a snapshot in time of your ability to take a single one-and-done assessment on several different occasions and then average them all together.

So what do you do now if you're not giving letter grades on assignments?

Before I spell out any details, I would just say that all you have to do is just look outside school and you'll see it everywhere. When my son, who's 14 now, was 6 or 7 years old, he was taking the swim class from my university, and he got a report card. And it had no points on it. It had no grades on it. It just had levels on it — the instructor would circle the level that he had completed and use some highlighters to show what skills he's good at doing, what skills he needs to continue to work on. And I saw that and I thought, that's just brilliant.

I mean, everywhere in life other than school, if there is an assessment to be done — whether you're in a job and you're getting an annual performance review, or if you're a professor and you're up for tenure and you're getting a portfolio review, or you're a musician and you're trying to learn a song — you don't get a point attached to your performance. You do something, you give it a try, you get some feedback relative to appropriately scaled professional standards from a trusted third party, and then you try to make sense of that feedback and you incorporate all that into a next iteration. And then that loop just keeps looping until what you have produced is good enough.

All human learning that's significant is based on feedback loops, except in school.

I guess my favorite alternative form of grading is called specifications grading. This was invented by Linda Nilson, who's a legendary faculty developer and thought leader about teaching and learning. You set up a list of learning objectives, things that students should be able to do by the time they finish the course, and you tie the grade in the course to just simply how many of those things they have accomplished.

So perhaps you're teaching a writing class and you might have different what Linda would call bundles of work set up. Like, you need to do a research paper and you need to do an expository paper, and you need to have the ability to do a creative-writing type of assignment, or something like that. And each of those three bundles would consist of several specific items of work, maybe a paper and an outline of that paper, or an oral presentation of that paper. And if you complete all the things in the bundle at a satisfactory level relative to some standards that you set up, which we call specifications, then you have earned full credit on that bundle. And to get an A in the class, you would need to complete all three bundles. To get a B in the class, you need to complete two out of the three. To get a C, you need to complete one out of the three.

But the trick of it is everything that you do can be redone if you're not happy with the result.

And so what this allows you to do is a student is essentially select the grade that you want to earn. So you can come in and say, I really think that I could get an A in the class. If I just put in the effort for it, then you know exactly what you have to do. You're getting feedback on your progress the entire way, and you're supported by the professor. On the other hand, maybe you're perfectly happy with a B in the class. We try to encourage people to shoot high, but maybe that's all you want. And if that's the case, you can pick one of those two bundles to do and just ignore the other one. And so it puts the student firmly in control of their own destiny in that course. And the professor is there as a guide to give feedback to the students and to set up just the environment and the opportunities to just keep on trying until they're happy.

This might make some professors feel like the course is less rigorous.

If that means just the overall legitimacy academically of a course, I feel like it makes a course more rigorous because you're just getting better data. You're getting direct observations of student work. And that's the whole reason I switched. I was tired of getting crappy data about student learning.

What is at stake here broadly? Why does the grading system you use matter?

This matters because we want education to mean something. Education kind of hinges at this point on certification. How do you know if a person is truly educated when they have a college degree?

If we think about traditional grading, we have to say that we have no idea what this information is conveying. This is a serious problem.

Let's say you're in a class and the class is hyper-traditional. So the entire course grade is based on three 100-point tests that are all averaged together. And you have one student that gets a zero on the first one, an 80 on the second one, and a 100 on the third one.

On the other hand, you've got another student that scores 60, 60 and 60. Both of those students have 180 points out of 300. That's a 60 percent. That's a D-minus. What story, though, is told about these students? They both look exactly the same.

And the first student has a much different journey than the second student does. The first student, who knows why she got the zero. Maybe it was because she legit didn't know the material at all. But maybe it was because she had COVID or maybe she had to miss class because she was taking care of a family member or had a job or something. That zero tells you literally nothing about her skill just looking at the number.

And yet it has to be averaged in with these two other grades that are actually really good. But she gets a D-minus for the class, whereas the other guy — the guy 60, 60, 60 — never really accomplishes anything. But they both get the same course grade.

What's at stake is whether that course grade, which we take such pains to create and file away and curate, actually conveys any information at all about the student. Or is it just like some random number average, like taking a bunch of ZIP codes and averaging them together? I mean, those are numbers, but you average them together and it means nothing.

So what's at stake is the epistemological basis of a course transcript, which is the currency of the modern workplace.

To hear the entire conversation, listen to the episode.