Technology in School

Assessments Become More Accessible With Speech Synthesis—and an Almost Human Voice

By John Waters     Sep 24, 2018

Assessments Become More Accessible With Speech Synthesis—and an Almost Human Voice

These are exciting times in the world of assistive learning technology. Recent advances in artificial intelligence (AI) and machine learning (ML) are behind next-generation improvements in a range of software tools for students with sensory or learning disabilities. Speech synthesis in particular is benefiting from deep learning technologies, which are used to generate speech that sounds like a human voice.

For the assessment mavens at Educational Testing Service (ETS), these evolving capabilities present both a challenge and an opportunity, says the organization’s Markku “Mark” Häkkinen.

Häkkinen leads the Accessibility Standards and Inclusive Technology Research Group at ETS, the 60-plus-year-old nonprofit educational testing and assessment organization that administers K-12 state exams along with the well-known GRE and TOEFL, among other tests. His research focuses on advancing the accessibility of computer-based assessments—for all learners—through a standards-based approach.

EdSurge spoke with Häkkinen about the impact of new speech synthesis technologies on student assessments, and how his company is making the most of them.

EdSurge: First, when we say “assistive technologies,” what exactly are we talking about?

Mark Häkkinen: In general, we’re talking about any piece of hardware or software used to increase the functional capabilities of a person with a disability. It’s an umbrella term that covers everything from text-to-speech software to screen magnification.

You’ve said that these technologies present both a challenge and an opportunity. How so?

Advancements on the consumer side—where we see things like accessible mobile devices and the new digital assistants—have raised expectations in the community of individuals with disabilities who use assistive technologies at home in their daily lives. It can be a challenge to meet those expectations. But that technology also has the potential to allow us to expand our reach with our assessments and learning materials, so it’s a change we are embracing.

A lot of my group’s research—applied research really—looks at how we can make more transparent the fact that someone is accessing content non-visually. How do you present mathematics? How do you present science, technology, engineering, and math content? The goal is to ensure that test takers with disabilities are able to demonstrate what they know and what their skills are without concern for the assistive technologies they may be using to do so. We try to make assessments work for everyone, and we are using these technologies to help us fulfill that mission.

Does the speech synthesis tech you’re working with help visually impaired students exclusively?

It’s true that text-to-speech synthesis is essential for communicating what’s on a web page or a screen to someone who can’t see. But this technology actually serves a diverse population that comprises students with sensory impairments—blindness and low vision—as well as cognitive impairments and learning disabilities such as dyslexia.

A good example is a type of assistive technology called Read Aloud, where content that is viewed on a screen is also read aloud via text-to-speech synthesis. This is useful for someone who might have, say, dyslexia, and who needs a multimodal presentation of not just the text, but the audio, to better comprehend what’s being presented.

How are you implementing text-to-speech technology at ETS?

We’re implementing it in a number of ways. We’ve used Amazon’s speech technologies, for example, for a couple of large statewide assessments. One of our subsidiary organizations, Questar, is using the Amazon Polly text-to-speech service for the Read Aloud functionality.

We also used Amazon Polly to address a unique challenge with our Graduate Record Examination (GRE) for blind test candidates. These tests are timed; if you’re a sighted test candidate, you see a countdown timer on your screen and you get prompts that pop up at ten, five, and three minutes. You can, with a glance, always know how much time you have left.

But you can’t do that if you’re blind, which is unfair to the candidates. We were able to solve this problem by applying some earlier research I had done on warning messages and how you alert people who are engaged in a critical task. We used Amazon Polly to create a kind of auditory glance: a short chime that lets you know a message is coming, followed by messages letting you know you how many minutes you have left in a section of the test section. It’s designed to allow you to keep going with your task without having to shift your attention.


Ready to try out Amazon Polly? Check out the following: FAQsGetting started pageResources


What should people look for in a text-to-speech development platform?

I’d consider several factors. Beyond the students’ expectations for lifelike speech, there’s the need for consistency of pronunciation. It’s not unusual, for example, for a large state to have districts with a range of computers in the classrooms—from Macs to Chromebooks. Each of them might have a different voice synthesizer using different pronunciation rules for the terminology in math, science and history. A classroom teacher might be saying a scientific term using a standard pronunciation, while the students are hearing it six different ways depending on the platform they are using. From a pedagogical perspective, especially if you are assessing student knowledge, you don’t want the technology to introduce confusion.

I think support for Speech Synthesis Markup Language (SSML) is also important. We used SSML to create those timed warning messages I mentioned earlier. SSML also allows us to insert pauses between words and numbers as spoken. It gives us the flexibility we need to ensure the highest quality user experience for test candidates. We’ve gotten feedback that ETS has actually raised the bar for all the assessment organizations in this regard.


Want to learn more about how edtech companies, including Cerego and Frontline Education are starting to build voice-forward experiences for their products? Join EdSurge for a webinar panel discussion on October 17, 2pm ET | 11am PT.


I also think it’s important to find a service provider that embraces the continuous improvement model, which Amazon Web Services (AWS) does. Amazon Polly is based on machine learning technologies, so it’s only going to improve over time. That was one of the things that attracted us two years ago when we started working with the technology.

Technology in School

Assessments Become More Accessible With Speech Synthesis—and an Almost Human Voice

By John Waters     Sep 24, 2018

Assessments Become More Accessible With Speech Synthesis—and an Almost Human Voice

These are exciting times in the world of assistive learning technology. Recent advances in artificial intelligence (AI) and machine learning (ML) are behind next-generation improvements in a range of software tools for students with sensory or learning disabilities. Speech synthesis in particular is benefiting from deep learning technologies, which are used to generate speech that sounds like a human voice.

For the assessment mavens at Educational Testing Service (ETS), these evolving capabilities present both a challenge and an opportunity, says the organization’s Markku “Mark” Häkkinen.

Häkkinen leads the Accessibility Standards and Inclusive Technology Research Group at ETS, the 60-plus-year-old nonprofit educational testing and assessment organization that administers K-12 state exams along with the well-known GRE and TOEFL, among other tests. His research focuses on advancing the accessibility of computer-based assessments—for all learners—through a standards-based approach.

EdSurge spoke with Häkkinen about the impact of new speech synthesis technologies on student assessments, and how his company is making the most of them.

EdSurge: First, when we say “assistive technologies,” what exactly are we talking about?

Mark Häkkinen: In general, we’re talking about any piece of hardware or software used to increase the functional capabilities of a person with a disability. It’s an umbrella term that covers everything from text-to-speech software to screen magnification.

You’ve said that these technologies present both a challenge and an opportunity. How so?

Advancements on the consumer side—where we see things like accessible mobile devices and the new digital assistants—have raised expectations in the community of individuals with disabilities who use assistive technologies at home in their daily lives. It can be a challenge to meet those expectations. But that technology also has the potential to allow us to expand our reach with our assessments and learning materials, so it’s a change we are embracing.

A lot of my group’s research—applied research really—looks at how we can make more transparent the fact that someone is accessing content non-visually. How do you present mathematics? How do you present science, technology, engineering, and math content? The goal is to ensure that test takers with disabilities are able to demonstrate what they know and what their skills are without concern for the assistive technologies they may be using to do so. We try to make assessments work for everyone, and we are using these technologies to help us fulfill that mission.

Does the speech synthesis tech you’re working with help visually impaired students exclusively?

It’s true that text-to-speech synthesis is essential for communicating what’s on a web page or a screen to someone who can’t see. But this technology actually serves a diverse population that comprises students with sensory impairments—blindness and low vision—as well as cognitive impairments and learning disabilities such as dyslexia.

A good example is a type of assistive technology called Read Aloud, where content that is viewed on a screen is also read aloud via text-to-speech synthesis. This is useful for someone who might have, say, dyslexia, and who needs a multimodal presentation of not just the text, but the audio, to better comprehend what’s being presented.

How are you implementing text-to-speech technology at ETS?

We’re implementing it in a number of ways. We’ve used Amazon’s speech technologies, for example, for a couple of large statewide assessments. One of our subsidiary organizations, Questar, is using the Amazon Polly text-to-speech service for the Read Aloud functionality.

We also used Amazon Polly to address a unique challenge with our Graduate Record Examination (GRE) for blind test candidates. These tests are timed; if you’re a sighted test candidate, you see a countdown timer on your screen and you get prompts that pop up at ten, five, and three minutes. You can, with a glance, always know how much time you have left.

But you can’t do that if you’re blind, which is unfair to the candidates. We were able to solve this problem by applying some earlier research I had done on warning messages and how you alert people who are engaged in a critical task. We used Amazon Polly to create a kind of auditory glance: a short chime that lets you know a message is coming, followed by messages letting you know you how many minutes you have left in a section of the test section. It’s designed to allow you to keep going with your task without having to shift your attention.


Ready to try out Amazon Polly? Check out the following: FAQsGetting started pageResources


What should people look for in a text-to-speech development platform?

I’d consider several factors. Beyond the students’ expectations for lifelike speech, there’s the need for consistency of pronunciation. It’s not unusual, for example, for a large state to have districts with a range of computers in the classrooms—from Macs to Chromebooks. Each of them might have a different voice synthesizer using different pronunciation rules for the terminology in math, science and history. A classroom teacher might be saying a scientific term using a standard pronunciation, while the students are hearing it six different ways depending on the platform they are using. From a pedagogical perspective, especially if you are assessing student knowledge, you don’t want the technology to introduce confusion.

I think support for Speech Synthesis Markup Language (SSML) is also important. We used SSML to create those timed warning messages I mentioned earlier. SSML also allows us to insert pauses between words and numbers as spoken. It gives us the flexibility we need to ensure the highest quality user experience for test candidates. We’ve gotten feedback that ETS has actually raised the bar for all the assessment organizations in this regard.


Want to learn more about how edtech companies, including Cerego and Frontline Education are starting to build voice-forward experiences for their products? Join EdSurge for a webinar panel discussion on October 17, 2pm ET | 11am PT.


I also think it’s important to find a service provider that embraces the continuous improvement model, which Amazon Web Services (AWS) does. Amazon Polly is based on machine learning technologies, so it’s only going to improve over time. That was one of the things that attracted us two years ago when we started working with the technology.

Get our email newsletterSign me up
Keep up to date with our email newsletterSign me up