Monday, July 07, 2014

Bill Gates should hire a statistical advisor

This from Junk Charts:
My coworker pointed me to a Huffington Post article claiming a Bill Gates byline that contains some highly dubious analysis and a horrific chart. We presume Gates was fed this information by some analysts but even so, one wishes he wouldn't promote innumeracy. But then, he has a history: Howard Wainer demolished analysis by his foundation used to channel lots of dollars to the "small schools" movement a few years ago; I wrote about that before.


First, the offensive chart:

Using double axes earns justified heckles but using two gridlines is a scandal!  A scatter plot is the default for this type of data. (See next section for why this particular set of data is not informative anyway.)

I can't understand the choice of scale for the score axis. The orange line, for instance, seems to have a positive slope. In any case, since these scores are "scaled", and the "standard error" is about 1 (this number is surprisingly hard to find, even on Google), it would appear that between 300 and 400 on the score axis, there are 100 units of standard error. By convention, three units of standard error away from the average is considered rare (events). There is no conceivable way that the average score could jump by that much.

 The analysis is also flawed. Here's the key paragraph:
Over the last four decades, the per-student cost of running our K-12 schools has more than doubled, while our student achievement has remained flat, and other countries have raced ahead. The same pattern holds for higher education. Spending has climbed, but our percentage of college graduates has dropped compared to other countries... For more than 30 years, spending has risen while performance stayed flat. Now we need to raise performance without spending a lot more.
This argument contains several statistical fallacies:
  • Comparing apples and oranges: a glaring piece of missing information is whether other countries have increased their per-student spending on education, and if so, how fast the growth is compared to that in the U.S. Without this, the analysis makes no sense.
  • Confusing correlation and causation: so spending increased while test scores stagnated.  In order to conclude that there is something wrong with the spending, one must first believe that spending has a causal effect on test scores. Observe that this is not a conclusion from the data; it is an assumption going into the analysis, neither supported nor disputed by the data since the data merely show a (lack of) correlation. This is another instance of "story time": we see data, we see conclusion, we are misled into thinking that data supports conclusion but in fact, the data is an irrelevant distraction. (For other instances of "story time", see this link to my book blog.)
  • Fallacy #1 and fallacy#2 combined: even if you believe that spending affects test scores, it is still a stretch to say that spending in U.S. schools affects the gap in test scores between U.S. students and foreign students. In the world where foreign countries are frozen in time, maybe so but where foreign countries are investing in education, one can't say anything about the test score gap without first knowing what's going on overseas.
  • Assumption invalidating the analysis: In a short breath, the analyst admits the possibility of (a) spending increase together with flat scores and (b) score increase together with flat spending. One model under which both of those possibilities coexist is one in which test scores are independent of spending. If so, why would one even look at a plot of these two quantities?
  • The dilemma of being together (a la Chapter 3 of Numbers Rule Your World): sorry to say but the spending on pupils is likely to have a highly skewed distribution depending on school district. Also, the average test scores is likely to have high variability across school districts. Thus, using an average for the entire country muddies the water.
  • Needless to say, test scores are a poor measure of the quality of education, especially in light of the frequent discovery of large-scale coordinated cheating by principals and teachers driven by perverse incentives of the high-stakes testing movement.
In the same article, Gates asserts that quality of teaching is the greatest decisive factor explaining student achievement. Which study proves that we are not told. How one can measure such an intangible quantity as "excellent teaching" we are not told. How student achievement is defined, well, you guessed it, we are not told.

It's great that the Gates Foundation supports investment in education. Apparently they need some statistical expertise so that they don't waste more money on unproductive projects based on innumerate analyses.


Anonymous said...

Thanks for the analysis, not sure if the flaws in the graph and interpretation or ironic, supporting evidence for the author's assertion or both.

1977 was a while back and I am not sure if it is any more legitimate to compare how much we spent now and then than it is to compare what was expected of schools and educators over 37 years ago. I do wonder why the author stops in 2007 and doesn't report any recent data - wonder if over the last seven years the data would be contrary to his point. Surely we haven't reduced spending on education and seen an increase in achievement - heck, then he wouldn't have anything to write about.

Anonymous said...

This is the very argument cited by Dr. Shelton and other administrators in support of cutting $2 million from the Special Ed budget. They referenced assistance to the FCPS administration by a Gates Foundation grant for consultation from District Management Council. See and their free access article "Boosting the Quality and Efficiency of Special Education" for illogical, barely readable "analysis" of dubious data comparing school districts' spending on special education and student achievement (presented with pseudonymous labels; i.e. for all we know fictional). This was used as support by FCPS to cut para-educators. It's expensive, our scores aren't high enough, so let's make it less expensive and see if that helps. When directly asked at the school board budget vote (as well as numerous e-mails to Dr.Shelton and the Board prior to the vote) what was being put in place instead - no answers. A vague "we can come up with plans before August" dismissal of concerns was offered before passing the budget. It appeared to many parents to be only about the money - recovering some of it from special education, where " it does no good", to sum it up.

Richard Innes said...


I’m not sure the critics are much sharper than the criticized on this one.

The critic’s discussion of the standard errors completely lost me. It makes no sense to talk about “100 units of standard error” between the scores of 300 and 400 on the NAEP Long Term Trend scales (and this has to be NAEP LTT for Age 17, which generates another bad labeling hit for the chart).

I’d like to hear Skip Kifer on this. I’m surprised you didn’t invite him from the outset.

Anyway, right now, it looks like Arthur Levine has struck, again.

Don’t get that?

Read these:

In fact, these Levine reports should be mandatory reading at some point in the EKU Ed School curriculum. As a past president of Columbia Teachers College, he deserves consideration.