The Gates Foundation Spent $200M+ Trying to Improve Teacher Performance, and All It Got Was This Report

In 2009, the Bill and Melinda Gates Foundation invested hundreds of millions to see whether “value-added” models could be used to evaluate, inform and improve teachers’ effectiveness in improving student outcomes. The most valuable thing it learned may have come from a report offering a blunt answer: no.

Conducted by the RAND Corporation and the American Institute for Research, the report followed three school districts and four charter management organizations through a six-year effort (from 2010 to 2016) to implement Gates-supported reforms to improve teachers’ performance. Tactics ranged from implementing new classroom observation practices and professional-development opportunities, to revamping recruiting, hiring, promotion and firing procedures.

The summary towards the end of the 526-page report suggests these efforts struck out in all regards:

Overall, the initiative did not achieve its stated goals for students, particularly LIM [low-income minority] students. By the end of 2014–2015, student outcomes were not dramatically better than outcomes in similar sites that did not participate in the IP initiative. Furthermore, in the sites where these analyses could be conducted, we did not find improvement in the effectiveness of newly hired teachers relative to experienced teachers; we found very few instances of improvement in the effectiveness of the teaching force overall; we found no evidence that LIM students had greater access than non-LIM students to effective teaching; and we found no increase in the retention of effective teachers, although we did find declines in the retention of ineffective teachers in most sites.

Dubbed the “Intensive Partnerships for Effective Teaching,” this Gates-funded effort attempted to take a scientific approach to quantify and evaluate the art of teaching. Participating schools created new teacher-evaluation systems that factored in student performance on state and local tests, classroom observations conducted by a school administrator and feedback from student and parent surveys.

This information was then combined to create an “effectiveness rating” for each teacher. Those with a low rating would get professional-development support, or be placed on a performance improvement plan. On a more draconian level, these measures would also inform decisions around teacher compensation, and how they would be promoted—or dismissed.

But implementing these extensive and systemic changes, the report’s authors noted, proved difficult for a host of reasons. Among them:

New evaluation systems required school principals to observe every teacher and meet with them afterwards. But principals—who have many other duties—were often crunched on time.
Some principals were not thoroughly trained in the rubrics used evaluate teachers used in classroom observations.
Data on student achievement and growth, based on their performance on tests, were inconsistent and sometimes unavailable.

That schools had trouble getting reliable data on students’ test results, which factored into a teacher’s performance score, raises further questions about the feasibility of implementing value-added models. The idea is that it is possible to isolate and determine how much a teacher contributes to a student’s growth, as measured by their performance on tests. It’s not a new approach. Earlier this decade, some states incorporated VAM into teacher evaluations as a condition of securing federal education funding from Race to the Top grants.

The results have been mixed. A 2014 study published by the American Educational Research Association found “surprisingly weak associations” between value-added measures of teacher performance to the quality of classroom instruction. (That study looked at data from another initiative also supported by the Gates Foundation.)

In theory, the new evaluation systems could be used to justify removing underperforming teachers. But principals seemed hesitant to put this into practice. As the report authors wrote: “The fact that teacher-evaluation results were used as the basis for tenure and dismissal decisions might have led some principals to avoid giving low observation ratings that would reduce a teacher’s composite score.” As a result, the principal observation ratings were skewed.

Overall, the report estimated that the seven school sites spent a total of $575 million into these teacher-improvement efforts, of which $212 million came from the Gates Foundation.

So was this all just a long, expensive experiment to test a hypothesis that didn’t pan out?

The faulty experiment might leave a lasting impression, according to Bloomberg columnist Cathy O’ Neil, who charged that the effort likely undermined teacher morale, and noted the value-added approach to evaluate teachers has “unfairly ruined careers” in other districts.

For the first two years of this experiment, the report’s authors found that many educators bought in to the program. But that attitude changed when teachers learned that their jobs could be at stake. “Teacher organizations,” they wrote, “began to object and mount public campaigns against the effectiveness measures when high stakes were due to be attached and larger numbers of teachers were threatened.”

One half-hearted silver lining offered in the report’s conclusion is that the Gates initiative “succeeded” in the sense that schools found new ways to measure teachers. But school leaders just couldn’t leverage those measurements to improve student outcomes.