Teachers: Much More Than You Wanted To Know By scott alexander
Domanda: esiste “il buon insegnante”? Risposta: sembrerebbe di no.
Newspapers report that having a better teacher for even a single grade (for example, a better fourth-grade teacher) can improve a child’s lifetime earning prospects by $ 80,000.
AVERE UN BUON INSEGNANTE
if the test scores of two kids in the same teacher’s class were on average no more similar than the test scores of two kids in two different teachers’ classes, then teachers can’t matter very much.
UN TEST X VALUTARE IL PESO DELL’INSEGNANTE
they all agree pretty well that individual factors are most important, followed by school and teacher factors of roughly equal size. Teacher factors explain somewhere between 5% and 20% of the variance. Other studies seem to agree, usually a little to the lower end. For example, Goldhaber, Brewer, and Anderson (1999) find teachers explain 9% of variance; Nye, Konstantopoulos, and Hedges (2004) find they explain 13% of variance for math and 7% for reading. The American Statistical Association summarizes the research as “teachers account for about 1% to 14% of the variability in test scores”, which seems about right.
ESITO DI TRE STUDI: CONTA POCO
So put more simply– on average, individual students’ level of ability grit is what makes the difference. Good schools and teachers may push that a little higher, and bad ones bring it a little lower, but they don’t work miracles.
it’s much easier to say “this is 10% dependent on school-level factors and 10% based on teacher-level factors”
In terms of observable teacher-level effects, the only one they can find that makes a difference is gender (female teachers are better). Teacher certification, years of experience, certification, degrees, et cetera have no effect. This is consistent with most other research, such as Miller, McKenna, and McKenna (1998). A few studies that we’ll get to later do suggest teacher experience matters; almost nobody wants to claim certifications or degrees do much.
GENDER E ESPERIENZA
The most robust finding in the research literature is the effect of teacher verbal and cognitive ability on student achievement. Every study that has included a valid measure of teacher verbal or cognitive ability has found that it accounts for more variance in student achievement than any other measured characteristic of teachers (e.g., Greenwald, Hedges, & Lane, 1996; Ferguson & Ladd, 1996; Kain & Singleton, 1996; Ehrenberg & Brewer, 1994).
Teachers account for about 10% of variance in student test scores, it’s hard to predict which teachers do better by their characteristics alone, and schools account for a little more but that might be confounded.
CONFUSIONE IN AGGUATO
Suppose you want to figure out which teachers in a certain district are the best. You know that the only thing truly important in life is standardized test scores , so you calculate the average test score for each teacher’s class, then crown whoever has the highest average as Teacher Of The Year.
INDIVIDUARE IL MIGLIORE. FACILE?
But you’ll probably just give the award to whoever teaches the gifted class.
So okay, back up. Instead of judging teachers by average test score, we can judge them by the average change in test score.
INCREMENTO NEI TEST
this is the basic idea behind VAM (value-added modeling), the latest Exciting Educational Trend and the lynchpin of President Obama’s educational reforms. If you use VAM to find out which teachers are better than others, you can pay the good ones more to encourage them to stick around.
VAM HIGHT STAKE
A teacher whose VAM is two standard deviations above the mean should have students who score on average 0.2 standard deviations above the mean.
what happens if we compound this and give this student the best teachers many years in a row? Sanders and Rivers (also Jordan, Mendro, and Weerasinghe) argue the effects are impressive and cumulative.
UNO STUDENTE MEDIO NELLE MANI DEI MIGLIORI
A RAND education report criticizes these studies as “using ad hoc methods” and argue that they’re vulnerable to double-counting student achievement… Sanders and Rivers] provide evidence of the existence and persistence of teacher or classroom effects, but the size of the effects is likely to be somewhat overstated”….
Just by eyeballing and playing around with it, it looks like most of the gain from these “three consecutive great teachers” actually comes from the last great teacher. So the superadditivity might not be quite right, and Sanders and Rivers might just be genuinely finding bigger teacher effects than anybody else. At what rate do these gains from good teachers decay? They decay pretty fast. Jacob, Lefgren and Sims find that only 25% of gains carry on to the next year, and only 15% to the year after that… Kane and Rothstein find much the same. A RAND report suggests 20% persistence after one year and 10% persistence after two…. All of this contradicts Sanders and Rivers pretty badly….
DATI SOSPETTI: I BENEFICI DECADONO
None of these studies can tell us whether the gains go all the way to zero after a long enough time. Chetty does these calculations and finds that they stabilize at 25% of their original value. But this number is higher than the two-year number for most of the other studies, plus Chetty is famous for getting results that are much more spectacular and convenient than anybody else’s. I am really skeptical here.
CHETTY SULLA DECADENZA
remember Louis Benezet’s early 20th century experiments with not teaching kids any math at all until middle school– after a year or two they were just as good as anyone else, suggesting a dim view of how useful elementary school math teachers must be.
INSEGNARE AI 13ENNI ANALFABETI
In summary, I think there’s pretty strong evidence that a + 1 SD increase in teacher VAM can increase same-year test scores by + 0.1 SD, but that 50%– 75% of this effect decays in the first two years.
I expected teachers’ groups and education specialists to be pushing all the positive results. After all, what could be better for them than solid statistical proof that good teachers are super valuable? In fact, these groups are the strongest opponents of the above studies… They argue that VAM is biased and likely to unfairly pull down teachers who get assigned less intelligent lower-grit kids….
L‘OPPOSIZIONE DEGLI INSEGNANTI AGLI STUDI
I think they have some good points about how VAM isn’t always a good measure. First, it seems to depend a lot on student characteristics; for example, it’s harder to get a high VAM in a class full of English as a Second Language students.
PRIMO DIFETTO DEL VAM
Also, a lot of VAM models control for student race, gender, socioeconomic status, et cetera. I guess this is better than not doing this, but it seems to show a lack of confidence– if controlling for prior achievement was enough, you wouldn’t need to control for these other things. But apparently people do feel the need to control for this stuff, and at that point I bring up my usual objection that you can never control for confounders enough,
INDIZI SULLA MANCANZA DI FIDUCIA
Maybe because of this, there’s a lot of noise in VAM estimates. Goldhaber & Hansen (2013) finds that a teacher’s VAM in year t is correlated at about 0.3 with their VAM in year t + 1. A Gates Foundation study also found reliabilities from 0.19 to 0.4, averaging about 0.3. Newton et al get slightly higher numbers from 0.4 to 0.6; Bessolo a wider range from 0.2 to 0.6. But these are all in the same ballpark, and Goldhaber and Hanson snarkily note that standardized tests aimed to assess students usually need correlations of 0.8 to 0.9 to be considered valid (the SAT, for example, is around 0.87).
NOISE DEL VAM
Even if VAM is a very noisy estimate, can’t the noise be toned down by averaging it out over many years? I think the answer is yes, and I think the most careful advocates of VAM want to do this, but President Obama wants to improve education now and a lot of teachers don’t have ten years worth of VAM estimates.
Opponents argue that it might not be, and cite Paufler and Amrein-Beardsley‘s survey of principals, in which the principals all admit they don’t assign students to classes randomly.
Rothstein (2009) tries to “predict” students’ fourth-grade test scores using their fifth-grade teacher’s VAM and finds that this totally works. Either schools are defying the laws of time and space, or for some reason the kids who do well in fourth-grade are getting the best fifth-grade teachers.
I MIGLIORI CON I MIGLIORI
It would be nice to be able to draw all of this together by saying that teachers have almost no persistent effects, and the genetic component identified by Plomin and pointed at by Rothstein represents the 15– 25% “permanent” gain identified by Chetty and others which so contradicts my lack of line dancing memories.
GENETICA E INSEGNANTI
In summary, there are many reasons to be skeptical of VAM. But some of these reasons contradict each other, and it’s not clear that we should be infinitely skeptical.
GIUSTO ESSERE SCETTICI?
let’s go back to that study that says that a good fourth grade teacher can earn you $ 89,000. The study itself is Chetty, Friedman, and Rockoff (part 1, part 2).
TORNIAMO A CHETTY
This sounds impressive, but imagine the average kid works 40 years. That means it’s improving yearly earnings by about $ 1,000.
1000 ALL’ ANNO
They found that such teachers improved yearly earnings by about $ 300, but their study population was mostly in their late twenties and not making very much, and they extrapolated that if good teachers could increase the earnings of entry-level workers by $ 300, eventually they could increase the earnings of workers with a little more experience by $ 1000.
LA TESI CHETTY. PIÙ CHE ALTRO ESTRAPOLAZIONI
But really, who cares? The fact that having a good fourth grade teacher can improve your adult earnings any measurable amount is the weird claim here. Once I accept that, I might as well accept $ 300, $ 1,000, or $ 500,000.
PRIMA CRITICA A CHETTY
Everyone else has found that teacher effects on test scores decay very quickly over time. Chetty has sort of found that up to 25% of them persist, but he doesn’t really seem interested in defending that claim and agrees that probably test scores just fade away. Yet as he himself admits, good teachers’ impact on earnings works as if there were zero fadeout of teacher effects.
CHETTY NN PRENDE SUL SERIO IL DECADIMENTO
Project STAR (Student Teacher Achievement Ratio) was a big educational experiment in the 80s and 90s to see whether or not smaller class size improved student performance… So Chetty, Friedman, Higer, Saez, Schanzenbach, and Yagan analyzed the STAR data….
CHETTY SFRUTTA IL L’ASSEGNAZIONE RANDOM DI STAR
When they’re using teacher quality to predict the success of specific students, they use the average of all the test scores
CON IL RANDOM IL VAM NON SERVE
They find that the average test score of all the other students in your class, compared against the average score of all the students in other randomly assigned classes in your school, predicts your own test score. “A one percentile increase in entry-year class quality is estimated to raise own test scores by 0.68 percentiles, confirming that test scores are highly correlated across students within a classroom”.
CONTANO I PEER… QUINDI L’INSEGNANTE
This fades to approximately zero by fourth grade, confirming that the test-score-related benefits of having a good teacher are transient and decay quickly.
MA ECCO TORNARE LA DECADENZA
Carrell finds that “exposure to a disruptive peer in classes of 25 during elementary school reduces earnings at age 26 by 3 to 4 percent”… If this is a big factor in the differences in performances between classes, then so-called “teacher quality” might be conflated with a measure of how many children in their classes are behavioral problems,…
PEER O INSEGNANTE?
Again, everybody finds that test score gains do not last nearly that long. So it can’t be that kindergarten teachers provide you with a useful fund of knowledge which you build upon later.
The test scores gains from pre-K are notorious for vanishing after a couple of years, but a few really big preschool studies like the Perry Preschool Program found that such programs do not boost IQ but may have other effects (though to complicate matters, apparently Perry did boost later-life standardized test scores, just not IQ scores, and to further complicate matters, other studies find children who went to pre-K have worse behavior).
there’s strong evidence that parents have relatively little non-genetic impact on their childrens’ life outcomes, but now we’re saying that even a kindergarten teacher they only see for a year does have such an impact?
ALTRO PARADOSSO CHE NDEBOLISCE I RISULTATI SUGLI ASILI
teacher quality probably explains 10% of the variation in same-year test scores. A + 1 SD better teacher might cause a + 0.1 SD year-on-year improvement in test scores. This decays quickly with time and is probably disappears entirely after four or five years, though there may also be small lingering effects. It’s hard to rule out the possibility that other factors, like endogenous sorting of students, or students’ genetic potential, contributes to this as an artifact… For some reason, even though teachers’ effects on test scores decay very quickly, studies have shown that they have significant impact on earning as much as 20 or 25 years later, so much so that kindergarten teacher quality can predict thousands of dollars of difference in adult income….
SUNTO: IMPATTO DECADENZA E PARADOSSI
I really don’t know whether to believe this and right now I say 50-50 odds that this is a real effect or not– mostly based on low priors rather than on any weakness of the studies themselves.