Voice Tech Has Been Around for Decades. Will It Finally Work for Education?

Speech-recognition systems date back to the 1950s. Yet the recent emergence of “smart” assistant devices in homes—powered by the likes of Alexa and Siri—has sparked renewed interest in their application as educational tools in the classroom.

However, do not conflate speech technology with smart speakers and other devices, says Satya Nitta, the former head of IBM’s research team on artificial intelligence for learning.

“What Alexa and Siri are are basically consumer assistants built for a specific purpose, which is to bring you news, entertainment, things like that. But the underlying technology can be used for much more specific use cases, especially for learning,” he says.

Patricia Scanlon, CEO of Soapbox Labs, agrees.

“Smart speakers are a great use case of speech recognition. They’ve opened people’s eyes to what’s possible. But they aren’t speech recognition themselves,” says Scanlon, whose company develops speech technology used in learning tools.

Nitta and Scanlon spoke on the current state of speech technology and its potential in education during a recent event originally scheduled for the SXSW EDU conference in March. They were joined by Sunil Gunderia, chief strategy officer at Age of Learning.

In education, speech-recognition technology addresses questions that are fundamentally different from those posed by general consumers, says Nitta. Instead of responding to queries about what the weather is like, or how to cook a certain dish, these tools can be tailored to support language learning, for use cases ranging from pronunciation and oral fluency. They can also help diagnose dyslexia and speech impairments.

Because the use cases are different, though, the algorithms that inform educational tools must be developed differently.

First and foremost, the data needs to be accurate, says Scanlon. For her team, capturing children’s speech data across different regions and dialects occupied most of its first seven years of operations. That effort involved fine-tuning its systems to recognize the myriad peculiarities in how kids speak in different scenarios.

“Children are very different acoustically,” notes Scanlon. “Their physical vocal tracks are different. And you’ve got to take into account their speaking behaviors. Unlike adults, they don’t always think about what they’re going to say—they just blurt things out. All these things we have to take into account.”

Because most consumer-grade speech tools were trained on adult voice data, according to Scanlon, they are limited in their application for use with children: “If you try to take something that’s been built for adults and deploy it for kids, or add a bit of kids’ data, you’re never going to break 80 percent accuracy.”

Nitta says he appreciates that someone’s done the hard work to collect that data, so that developers like him don’t have to start from scratch. Soapbox Labs licenses its voice technology to other educational technology developers.

Developers say that reliable speech technology can make it possible to apply findings from educational research into practice. Nitta cited a study conducted by Stanford University professor Bruce McCandliss, which suggests that sounding out words accurately may be more effective at helping children learn and recall vocabulary.

“Imagine if you are collecting audio data from within classrooms, and then you’re correlating them with learning outcomes, and also using them as a training tool for teachers to improve. That’s where interesting applications are possible,” says Nitta.

Scanlon is optimistic speech recognition technology can help school officials screen children for reading difficulties like dyslexia, so that they can get the support and intervention they need at an earlier age. “It takes much longer to intervene with a child who’s eight, versus a four- or five-year-old,” she says.

The prospect of collecting data as personal as children’s voices naturally raises concerns for parents and privacy advocates alike. Scanlon says it’s imperative for companies to be proactively transparent about how the data is to be used. For Soapbox Labs, that information is used only to “update our models and improve our AI,” she adds.

To further assuage fears, developers ought to be upfront about their business models, which should not involve the sale or sharing of any data. And it also means refraining from capturing and storing additional data unnecessarily. In other words, Scanlon says, all information collected must have a purpose.

As the processors in computing devices improve, speech-recognition technologies can also work offline, adds Scanlon. That means data no longer needs to be sent to the cloud, thus limiting their exposure. And it also enables apps to work in places that lack internet connection.

Speech technology can also extend educational opportunities to homes, especially for activities like oral fluency exercises that typically require the presence of a teacher, says Gunderia. “In the environment we are now with the school closures, the ability to use something like speech recognition to serve as a way to provide kids meaningful feedback in terms of how they can improve in their reading, fluency or their pronunciation is really compelling,” he notes.

Adds Nitta: “I believe personally that speech technology can be a force for good to help level the playing field, because not everybody has educated parents at home, or parents who have time to spare for schooling.”