Where Does Automated Essay Scoring Belong in K-12 Education?

Automated essay scoring is one of the most controversial applications of “big data” in edtech research. Writing is a deeply creative, emotive and personal endeavor. The idea that an objective, calculated algorithm is able to “grade” a student’s composition understandably makes people nervous.

Ever since a 2012 competition on Kaggle.com showed that most automated scoring systems are as reliable as humans when scoring timed standardized tests, reactions have ranged from cheerful optimism to existential fear. The technology clearly has a lot to prove before being accepted alongside common machine learning tools, like Netflix’s movie recommendation or Google Maps GPS navigation.

The leading critic of automated writing assessment, Les Perelman from MIT, is infamous for fooling automated scorers with artfully-crafted, nonsensical essays. His thesis--that computers cannot robustly react to maliciously trained students--casts an uncomfortable spotlight on some of the assumptions made by the standardized testing industry.

Our internal results, across many clients, tend to defy Perelman’s concerns; real-world attempts from students do not look like his carefully constructed examples. The problem with this debate, though, is not about reliability; the mistake is in limiting the discussion to testing. By focusing only on high-stakes assessment, I believe an exciting new application of computer science is being pigeonholed. Both critics and supporters end up disconnecting the technology from the classroom context and potential benefits to teachers and students.

Innovating in the English Language Arts Classroom

No amount of quantitative research on scoring reliability is going to result in tools that English teachers can take to their students. Elsewhere, some tools are starting to make real headway. Newsela is a great example, providing reading content that that is Lexile level-adjusted to each student, based on reading ability. Kidblog is a writing platform that fits an online medium that students value, while maintaining a safe environment.

But when it comes to sophisticated, personalized learning tools for literacy, there’s nowhere near the breadth of support that teachers of other quantitative subjects, like math, have had for years.

Automated essay scoring has the potential to jump-start this field--if companies could better understand its potential. This requires more than a conversation about reliability in standardized testing, and instead should focus on what teachers and students really need when collaborating on composition in classrooms.

For several months in spring 2014, my team at LightSide Labs immersed ourselves in English classrooms across western Pennsylvania and New York City. Our field study let us observe 30 teachers as they used our prototype automated essay scoring tool on their own terms. After that experience, I’m convinced that there’s a smart role for this type of tool.

My Proposal: Give the Grader to Students

The best place for an automated scoring tool is in students’ hands. After all, having a piece of writing evaluated by a teacher can be terrifying and stressful, especially for marginalized students. Research has shown that on-demand, low-barrier online resources create a “disclosure miracle” for people in these hard situations: they write more, and more often.

Automated assessment can change the locus of control, making essay revision a choice led by a student at his or her own pace, instead of a punitive requirement from a teacher. This change has two big impacts for teachers.

First, the negativity associated with assessment is suddenly under the student’s control. This drastically lowers the barrier to calling on a teacher for help. When students have their assessment in hand, the conversation changes. Teachers don’t have to focus on making a snap evaluation; students come into the conversation knowing where they stand. Young writers, even weaker ones, now seek out collaboration with teachers to improve their writing. Their goals are aligned.

Then, as teachers grade final assignments after students turn their essays in, an on-demand automated assessment can pay unexpected dividends. The automated tool will have recorded the student’s process of writing, and each draft--with mistakes and student’s revisions--is recorded and reviewable by teachers. Designed well, this is a gold mine. The ability to allow students to “show their work” effortlessly through such a revision history, gives teachers a better picture of their students’ progress.

Scoring essays for high-stakes exams is a reliable but utilitarian use of machine learning. It is functional, not innovative. Automated scoring alone, as a summative teacher support, is adequate--but incomplete. Teachers deserve a more thoughtful reinvention of the tools used to teach writing.

Technology is iterative, though, and as edtech largely moves towards personalized student learning, there’s a wealth of opportunity for automated essay scoring. It’s a piece of the puzzle in a much broader conversation.

In the tension between assessment and test prep, there will always be an arms race. Open research in the area is critical; we make our machine learning tools for written text open source for this reason. Even more important is that researchers listen to teachers and students, understand what their numbers mean in context, and push forward a collaborative, research-driven dialogue based on those real-world insights.

Entrepreneurs take note: this is an exciting time to be in English Language Arts classrooms.