Technology in School

Can Individual Tests Really Measure Collaboration?

By Stephen Noonoo     Jul 16, 2018

Can Individual Tests Really Measure Collaboration?

At the end of next school year, thousands of high school students will sit down at individual workstations, laptops in hand, for an end-of-course exam. But in a rather novel twist this one’s not just about what you know—but also what you can figure out.

That’s the idea at least behind the latest summative assessments from Project Lead the Way, a project-based STEM curriculum, which is introducing new tech-based question types to measure a raft of noncognitive skills from collaboration to general problem solving (in addition to subject-specific questions about engineering or coding concepts).

“We reflected and determined we had an opportunity to change the way we assess students to look and feel more like the in-classroom experience,” says Michelle Gough, a senior vice president for Project Lead the Way and its chief legal and assessment officer.

Project Lead the Way, or PLTW as it’s known for short, is a K-12 STEM curriculum that’s big on hands-on learning and having groups of students co-design solutions to science and engineering problems. Challenges might include designing apparel for extreme climates or improving water recycling during a drought.

For much of its two-decade history, the Indianapolis-based nonprofit has offered end-of-course assessments for its three “pathway” courses on engineering, computer science or biomedical sciences. But those had traditionally focused on subject-specific skills, and were entirely multiple choice. The new soft-skill component, given online, will be taken by up to 400,000 K-12 students in more than 10,000 schools at the end of the 2018-19 school year.

Testing for such skills relies heavily on technology and required a significant retooling. Yet questions remain about whether any individual test (especially one that has long relied on multiple choice) can truly measure collaboration and problem solving—skills that typically involve heavy doses of human interaction and teamwork.

A Question on Testing

When building the new tests, PLTW gathered panels of industry experts, educators and psychometricians, or scientists who study how to make tests fair. Their goal? To infuse both real-world scenarios and academic standards into the new exams, as well as topics like leadership that college admissions reps would care about.

To bolster its claim that there needs to be more measurement around soft skills, PLTW pointed EdSurge to a 2014 research brief co-authored by Linda Darling-Hammond, an emeritus professor at Stanford and longtime policy researcher, which calls for adding more collaboration, communication and problem-solving to both curricula and accountability systems.

But the individual computerized assessments PLTW has built aren’t exactly what Darling-Hammond had in mind. “Is it important to measure those skills? The answer is yes, and they can be measured,” says Darling-Hammond, who now heads up the nonprofit Learning Policy Institute, in an interview with EdSurge. “Can you measure them with a single kid sitting in front of a computer? I think there’s more of a question about that.”

In place of individual tests, Darling-Hammond is a proponent of what’s known as performance assessments, which measure skills through hands-on tasks—the kind PLTW features in its curriculum but not its assessments.

Some countries, such as Singapore and parts of Australia, use performance assessments as alternatives or complements to high-stakes testing. Typically they involve giving students a task to complete around designing an investigation or solving a problem. (Several U.S. states like New York and Kentucky feature them too). In some cases they’re collaborative, asking groups of students to test and present their solution together and write up their findings and contributions separately.

“There is real work on real problems, which are going to be much more transferable to the real-world situations that we want kids to be prepared for,” Darling-Hammond says.

Such approaches take training, resources and time, in addition to buy-in from districts, states and educators. But Gough maintains that group work isn’t ideal for individual assessments. Even if answers are recorded individually, she says, it’s too difficult to discern whether students contributed equally or if skill levels within the group are even comparable (which would “muddy the data,” she adds).

Technology-Enhanced

What’s easier to agree on is that multiple-choice questions are a poor way to measure soft skills. Darling-Hammond says the format is rarely used outside the U.S, where it’s unusually popular. “In scoring a multiple-choice test, just about every dollar after the first is profit for the commercial testing company,” she says, “whereas scoring a test where you've got open-ended answers requires that you train teachers to do scoring.”

For the new tests PLTW didn’t ditch multiple choice entirely, but rather added what it calls “technology-enhanced items,” which began cropping up a few years ago on national and international high-stakes assessments, including PISA and Smarter Balanced. This new breed of tech-laden items lets students answer questions through drag and drop, highlighting text or fill-in-the-blanks.

The PLTW tests also include situational-judgement items or hypothetical scenarios where students have to weigh options and come to decisions after reading a passage or watching a video vignette (e.g. “You are conducting an experiment and you realize your data has been corrupted…”). Developed by psychologists to test skills like leadership, they often feature multiple choice-like lists or the ability to rank options in order from, say, the most to least appropriate response. Depending on their answers, there is typically an opportunity for students to pick up at least partial credit on the PLTW tests, Gough says, “because there often isn’t a completely right or completely wrong answer—there is usually a best answer.”

Still, they may suffer from the same limitations that make typical multiple choice items problematic. “You can always eliminate the stupid answers,” Darling-Hammond says of situational-judgement items. “You could know the right answer without being a good collaborator in any sense yourself.”

AI Steps In

Barring actual student-to-student interaction, a better way to gauge soft skills is to simulate collaboration as closely as possible. In 2015, for the first time the PISA exam—given to 15-year-olds around the world—sought to measure individual and collaborative problem solving. It also introduced an AI-powered avatar that students interact with to complete a particular task, such as flying a rocket to the moon when you only control one aspect of the project (and thus must work closely with the AI).

“The idea was to see to what extent students would collaborate without an identified solution strategy, and to what extent they can overcome problems and difficulties,” says Andreas Schleicher, a director at the Organisation for Economic Co-operation and Development, which oversees PISA. “None of the problems we gave students required a lot of content knowledge or problem-solving expertise. It was all about the willingness and capacity of students to jointly manage problem situations.”

As part of its broad research into test design, PLTW presented the PISA approach to its panel and ended up adopting a similar AI scenario—a “simulated interaction,” Gough calls it—on the end-of-course exam for biomedical science students, who will interact with a virtual patient. It’s an approach not without its problems, Darling-Hammond says. But, she adds, it’s “a step in the right direction, and it's certainly a big step beyond multiple-choice testing.”

Setting Up for Success

Three years after giving its own collaboration-themed test to U.S. students, Schleicher says American students came off rather average when it came to those skill sets, placing 13th out of more than 50 countries.

“If you rank American students internationally in collaborative problem solving they don’t come off that well,” he says, before adding that it’s important to look at the bigger picture. “Americans did better on collaborative than they did on individual problem solving.”

But those taking PISA tests come from a broad range of schools, whereas those taking the new PLTW exams will have just completed a project-based learning curriculum focused not only on STEM but also on preparing them for real-world collaboration experiences. PLTW students therefore might be expected to score better on soft skill tests and thus look more attractive to their post-secondary prospects.

“We have a lot of research about how students who do PLTW are the students that are completing college, not changing majors,” Gough says. “It isn’t just the development of subject-matter skills, it's the inculcation of these transportable skills—the perseverance, problem solving...that has allowed them to be successful.”

Technology in School

Can Individual Tests Really Measure Collaboration?

By Stephen Noonoo     Jul 16, 2018

Can Individual Tests Really Measure Collaboration?

At the end of next school year, thousands of high school students will sit down at individual workstations, laptops in hand, for an end-of-course exam. But in a rather novel twist this one’s not just about what you know—but also what you can figure out.

That’s the idea at least behind the latest summative assessments from Project Lead the Way, a project-based STEM curriculum, which is introducing new tech-based question types to measure a raft of noncognitive skills from collaboration to general problem solving (in addition to subject-specific questions about engineering or coding concepts).

“We reflected and determined we had an opportunity to change the way we assess students to look and feel more like the in-classroom experience,” says Michelle Gough, a senior vice president for Project Lead the Way and its chief legal and assessment officer.

Project Lead the Way, or PLTW as it’s known for short, is a K-12 STEM curriculum that’s big on hands-on learning and having groups of students co-design solutions to science and engineering problems. Challenges might include designing apparel for extreme climates or improving water recycling during a drought.

For much of its two-decade history, the Indianapolis-based nonprofit has offered end-of-course assessments for its three “pathway” courses on engineering, computer science or biomedical sciences. But those had traditionally focused on subject-specific skills, and were entirely multiple choice. The new soft-skill component, given online, will be taken by up to 400,000 K-12 students in more than 10,000 schools at the end of the 2018-19 school year.

Testing for such skills relies heavily on technology and required a significant retooling. Yet questions remain about whether any individual test (especially one that has long relied on multiple choice) can truly measure collaboration and problem solving—skills that typically involve heavy doses of human interaction and teamwork.

A Question on Testing

When building the new tests, PLTW gathered panels of industry experts, educators and psychometricians, or scientists who study how to make tests fair. Their goal? To infuse both real-world scenarios and academic standards into the new exams, as well as topics like leadership that college admissions reps would care about.

To bolster its claim that there needs to be more measurement around soft skills, PLTW pointed EdSurge to a 2014 research brief co-authored by Linda Darling-Hammond, an emeritus professor at Stanford and longtime policy researcher, which calls for adding more collaboration, communication and problem-solving to both curricula and accountability systems.

But the individual computerized assessments PLTW has built aren’t exactly what Darling-Hammond had in mind. “Is it important to measure those skills? The answer is yes, and they can be measured,” says Darling-Hammond, who now heads up the nonprofit Learning Policy Institute, in an interview with EdSurge. “Can you measure them with a single kid sitting in front of a computer? I think there’s more of a question about that.”

In place of individual tests, Darling-Hammond is a proponent of what’s known as performance assessments, which measure skills through hands-on tasks—the kind PLTW features in its curriculum but not its assessments.

Some countries, such as Singapore and parts of Australia, use performance assessments as alternatives or complements to high-stakes testing. Typically they involve giving students a task to complete around designing an investigation or solving a problem. (Several U.S. states like New York and Kentucky feature them too). In some cases they’re collaborative, asking groups of students to test and present their solution together and write up their findings and contributions separately.

“There is real work on real problems, which are going to be much more transferable to the real-world situations that we want kids to be prepared for,” Darling-Hammond says.

Such approaches take training, resources and time, in addition to buy-in from districts, states and educators. But Gough maintains that group work isn’t ideal for individual assessments. Even if answers are recorded individually, she says, it’s too difficult to discern whether students contributed equally or if skill levels within the group are even comparable (which would “muddy the data,” she adds).

Technology-Enhanced

What’s easier to agree on is that multiple-choice questions are a poor way to measure soft skills. Darling-Hammond says the format is rarely used outside the U.S, where it’s unusually popular. “In scoring a multiple-choice test, just about every dollar after the first is profit for the commercial testing company,” she says, “whereas scoring a test where you've got open-ended answers requires that you train teachers to do scoring.”

For the new tests PLTW didn’t ditch multiple choice entirely, but rather added what it calls “technology-enhanced items,” which began cropping up a few years ago on national and international high-stakes assessments, including PISA and Smarter Balanced. This new breed of tech-laden items lets students answer questions through drag and drop, highlighting text or fill-in-the-blanks.

The PLTW tests also include situational-judgement items or hypothetical scenarios where students have to weigh options and come to decisions after reading a passage or watching a video vignette (e.g. “You are conducting an experiment and you realize your data has been corrupted…”). Developed by psychologists to test skills like leadership, they often feature multiple choice-like lists or the ability to rank options in order from, say, the most to least appropriate response. Depending on their answers, there is typically an opportunity for students to pick up at least partial credit on the PLTW tests, Gough says, “because there often isn’t a completely right or completely wrong answer—there is usually a best answer.”

Still, they may suffer from the same limitations that make typical multiple choice items problematic. “You can always eliminate the stupid answers,” Darling-Hammond says of situational-judgement items. “You could know the right answer without being a good collaborator in any sense yourself.”

AI Steps In

Barring actual student-to-student interaction, a better way to gauge soft skills is to simulate collaboration as closely as possible. In 2015, for the first time the PISA exam—given to 15-year-olds around the world—sought to measure individual and collaborative problem solving. It also introduced an AI-powered avatar that students interact with to complete a particular task, such as flying a rocket to the moon when you only control one aspect of the project (and thus must work closely with the AI).

“The idea was to see to what extent students would collaborate without an identified solution strategy, and to what extent they can overcome problems and difficulties,” says Andreas Schleicher, a director at the Organisation for Economic Co-operation and Development, which oversees PISA. “None of the problems we gave students required a lot of content knowledge or problem-solving expertise. It was all about the willingness and capacity of students to jointly manage problem situations.”

As part of its broad research into test design, PLTW presented the PISA approach to its panel and ended up adopting a similar AI scenario—a “simulated interaction,” Gough calls it—on the end-of-course exam for biomedical science students, who will interact with a virtual patient. It’s an approach not without its problems, Darling-Hammond says. But, she adds, it’s “a step in the right direction, and it's certainly a big step beyond multiple-choice testing.”

Setting Up for Success

Three years after giving its own collaboration-themed test to U.S. students, Schleicher says American students came off rather average when it came to those skill sets, placing 13th out of more than 50 countries.

“If you rank American students internationally in collaborative problem solving they don’t come off that well,” he says, before adding that it’s important to look at the bigger picture. “Americans did better on collaborative than they did on individual problem solving.”

But those taking PISA tests come from a broad range of schools, whereas those taking the new PLTW exams will have just completed a project-based learning curriculum focused not only on STEM but also on preparing them for real-world collaboration experiences. PLTW students therefore might be expected to score better on soft skill tests and thus look more attractive to their post-secondary prospects.

“We have a lot of research about how students who do PLTW are the students that are completing college, not changing majors,” Gough says. “It isn’t just the development of subject-matter skills, it's the inculcation of these transportable skills—the perseverance, problem solving...that has allowed them to be successful.”

Get our email newsletterSign me up
Keep up to date with our email newsletterSign me up