18 September 2017

Boris Johnson’s £350m mistake – and 6 other statistical errors we all fall for

Boris Johnson was publicly scolded by a statistics boss for resurrecting a Brexit campaign claim.

Sir David Norgrove, the head of the UK Statistics Authority, wrote an open letter to Boris Johnson on how he was “surprised and disappointed” by the foreign secretary restating that Brexit will lead an extra £350m per week being made available for public spending.

Norgrove went on to say that “it is a clear misuse of official statistics”. Johnson is not the first Conservative cabinet minister to be reprimanded by the Stats Authority. Six others have as well, as detailed by George Eaton.

Johnson made a classic statistical mistake – if we can call it that – by confusing net and gross contributions to the EU. He failed to mention that the EU made payments back to the UK to support the likes of agriculture and scientific research.

For the benefit of present and future cabinet ministers, here are some other statistical mistakes to watch out for:

1. P – hacking

P-hacking is a scientific term used to describe the practice of collecting data until non-significant results become significant. You can call it playing the system, unintended human biases or just the realities of working in a field.

Take the classic Derren Brown special on horse-racing (spoiler alert). In the special, Brown picks a random woman who bets on a new horse in subsequent races based on Brown’s recommendation.

She wins everytime and makes a lot of money. How did Brown known who would win? The theory online is that he bet on every single horse in every single race, and only aired the footage of when he and the woman won. Do an experiment enough times and you’re bound to get the results that you’re looking for.

2. Conflating a lower chance, with no chance

Most polls said Hillary Clinton had a higher chance of winning the US presidential election in 2016, but that didn’t mean Donald Trump had no chance of winning. The 45th president is the epitome of the saying: “if you don’t try, you won’t succeed.”

3. Confirmation Bias

We believe stats more readily when they confirm our prior prejudiced beliefs. Numerous studies have found that even if the evidence for your beliefs has been refuted, you are still unlikely to change your mind.

4. Unrepresentative sample

One reason cited for why pollsters did not predict Trump a higher chance of winning the 2016 election was that without the ubiquity of landline phones (where pollsters could randomly pick people from a phone-book), pollsters struggled to find a truly representative sample of the population.

In the run up to the UK 2017 general election, most polls predicted a Tory majority and a minority a hung parliament. The difference was down to the polling methodology. YouGov, a pollster which concentrated on improving its sample, was one of those closest to the mark.

5. Understanding significance

In most scientific literature, the statistical Holy Grail is the p-value of 0.05. A p-value tells you how likely the results of the experiment are due to chance alone.

For example, if you missed the bus every morning for two weeks, the p-value would tell you how likely that this was due to chance. The smaller the p-value, the more likely it wasn’t due to chance and was due to some other factors such as a new bus schedule or you waking up too late.

A p-value of 0.05 is arbitrary. Sure, five times in a hundred days you slip on a puddle, but the real reason you’re late is your alarm clock isn’t loud enough. But why are we interested in 5 per cent of cases? Why not 3 per cent? Better still 1 per cent?

This is what many now argue, especially in an era of big data, where sample sizes can number in the hundreds of thousands. They say that even if a pattern is deduced 95 per cent of the time, results which seem significant may simply be false positives, something which is due to chance.

6. Meaningless correlations

A press release from Labour’s grassroots movement Momentum boasts that their viral videos on social media reached far beyond their usual base. It says: “These people are more likely to follow Maximum Respect for the British Armed Forces and the Royal British Legion, Ant and Dec and Match of the Day than Jeremy Corbyn, the Labour Party or Momentum.”

This is a standard statistical error. More people in Britain generally like Ant and Dec and football more than any politician or political party. The more people who watch a video, the more Ant and Dec fans among them.

Johnson argued in his response to Norgrove that he was talking about “control” of the money and not about extra money. The never-ending battle between number-crunching and terminology continues…