Improving the quality of education delivered through our public schools can not only boost economic growth but also help to narrow income inequality in the United States. And the best way to improve education is to identify and promote the most talented teachers.
One way of measuring teachers’ effectiveness has been to see how much their students’ test scores rise. This kind of “value-added” measure is straightforward and can easily be used to weed out bad teachers and promote better ones.
Critics complain, however, that this measurement has two potential flaws: Some teachers’ scores may rise not because they have performed so well in the classroom but merely because they have better students. And some teachers may push up their students’ scores by teaching to the test, rather than giving students the understanding of concepts that pays off in the long run.
Two important pieces of research rebut both of these concerns, suggesting there are significant benefits to be gained from more aggressive use of value-added and other measures to evaluate teachers. The first study, sponsored by the Bill and Melinda Gates Foundation, looked at student selection. In a remarkable feat, the researchers randomly assigned students to about 1,600 different teachers. The random assignment ensured that any observed improvement in the students’ test scores was caused by their teachers.
The Gates team – Tom Kane of Harvard University, Daniel McCaffrey and Trey Miller of the Rand Corp. and Douglas Staiger of Dartmouth College – found, as non-randomized studies had also found, that value-added measures were predictive of student achievement. As they conclude, “our findings suggest that existing measures of teacher effectiveness provide important and useful information on the causal effects that teachers have on their students’ outcomes.”
The Gates researchers also experimented with various supplements to a purely test-based metric, and found that although the value-added measure did the heavy lifting, student surveys and observational analyses of teaching quality were useful. Interestingly, they found that teacher analysis could be done without having observers make random visits to the classroom; allowing a teacher to submit a self-selected set of videos from the classroom worked just as well, because even the best classes conducted by bad teachers were worse than those from better teachers.
The Gates team also partially addressed the second critique – that “good” teachers are only teaching to the test – by examining results from other measures of educational quality. For example, the researchers administered open-ended word problems to test students’ understanding of math. The teachers who were predicted to produce achievement gains on state tests produced gains two-thirds as large on the supplemental assessments.
An even more compelling rebuttal of the second critique, however, is found in a December 2011 paper by Raj Chetty and John Friedman of Harvard and Jonah Rockoff of Columbia University. These researchers assembled a database of 2.5 million third- through eighth-graders along with 18 million English and math tests from 1989 through 2009. They then linked that database with income-tax returns.
Their paper is fascinating because the researchers assessed how a high value-added teacher can influence students’ later earnings and other outcomes. Someone just teaching to the test, without improving the quality of education, wouldn’t be expected to have any lasting impact on students’ earnings. Yet Chetty and the others found big effects later on in students’ lives from having a higher value-added teacher.
By the time students reached age 28, for example, the benefit of one standard-deviation increase in teacher quality in a single grade raised their annual earnings by about 1 percent. The estimates also suggest that replacing a teacher in the bottom 5 percent of the value-added distribution with an average teacher would boost aggregate lifetime income for the students in that classroom by $250,000. And that would be true for every class in every year of instruction.
Exposure to a higher-rated teacher helped students in other ways, too, the researchers found: It increased their chances of attending college, raised their retirement-savings rates, and reduced their likelihood of becoming teenage parents.
The bottom line from both these important studies is that real-time measurements of teachers’ effectiveness, based either exclusively or mostly on how much their students’ standardized test scores improve, provide useful information that should not be ignored. And there are huge returns for students and the economy as a whole from shedding the teachers who do poorly on these measures, and replacing them with teachers who do better.
As the Gates report demonstrates, it’s possible to improve teacher effectiveness metrics. But that shouldn’t keep us from using the ones we have now. To help raise future productivity, we should set a clear goal for all school districts: to deny tenure to teachers in the bottom 10 percent of the distribution according to value-added measurements. That would still mean granting tenure to lots of teachers who perform worse than the average novice, but it would be a good start.