Support 100 years of independent journalism.

2 May 2013

The most misleading statistics of all, thanks to Simpson’s Paradox

When the average is the opposite of the data.

By Alex Hern

The New York TimesEconomix blog has a great illustration of one of my favourite statistical paradoxes ever (yes, I have favourite statistical paradoxes). Floyd Norris had earlier been discussing the disparity in American wages, and noticed something odd in the process.

The median weekly wage, adjusted for inflation, in the US has grown by just 0.9 per cent since 2000. But the median wage has actually fallen for high-school dropouts (by 7.9 per cent), high-school graduates (by 4.7 per cent), people with some college education (by 7.6 per cent) and people with at least one degree (by 1.2 per cent). In other words, every sub-group’s wage fell, even as the overall wage rose.

That’s an example of something called “Simpson’s Paradox”, where an apparent trend reverses when you disaggregate it. Norris explains how it happens:

The answer is that the relative size of the groups changed greatly over those 13 years. There are now many more college graduates working than there were then. There are fewer employees with a high school education or less. That changing nature of the work force meant that there are more (higher wage) well-educated people in the overall total now than there had been in 2000.

Adding to the population changes is the fact that the percentage of people with jobs has fallen less for college graduates (78.5 percent in 2000, 72.6 percent now) than it has for either high school graduates or people with some college education. The share of high school dropouts with jobs, however, is virtually the same now as it was in 2000.

Much the same result is found if you look at wages broken down by race and gender. Between 1980 and 2005, wages for all US workers increased by 3 per cent, but wages for every demographic subgroup increased by far more. That result was because of the steady increase in participation in the labour force by women and people of colour.

Interesting though it is, Simpson’s paradox actually illustrates a major problem standing in the way of statistical literacy: the field is frequently counterintuitive to the point of madness. We try to hold politicians mis-using statistics to account, but it’s far more common to find public figures taking advantage of statistical quirks like this than it is to find them outright lying.

And trying to explain why, actually, it’s more accurate to say that wages have fallen in the last thirty years even though the headline figure shows otherwise is a hiding to nothing. Sometimes, interesting quirks in statistics can be damned annoying.