A statistical trick which reveals whether MPs are lying about expenses

Benford's law has many uses. Can it trip up MPs?

Are politicians routinely making up expenses? A simple statistical test suggests not.

Benford's law is a statistical artefact found in numerical data spanning several orders of magnitude. Ben Goldacre explains:

Imagine you have data on, say, the population of every world nation. Now, take only the "leading digit" from each number: the first number in the number, if you like. For the UK population, which was 61,838,154 in 2009, that leading digit would be "six". Andorra's was 85,168, so that's "eight". And so on.

If you take all those leading digits, from all the countries, then overall, you might naively expect to see the same number of ones, fours, nines, and so on. But in fact, for naturally occurring data, you get more ones than twos, more twos than threes, and so on, all the way down to nine. This is Benford's law: the distribution of leading digits follows a logarithmic distribution, so you get a "one" most commonly, appearing as first digit around 30% of the time, and a nine as first digit only 5% of the time.

This pattern should repeat for almost any data which matches the key condition of spanning a large range of sizes. Take the example above, world populations, which goes from 800 in the Vatican City to 1.35 billion in China. But one category of data which rarely obeys the law is that where the numbers are made-up. When people are trying to "randomly" write down numbers, they rarely do it very well, more frequently following the intuition that random data ought to have just as much chance of starting with any given digit.

The value of MP's expenses certainly spans several orders of magnitude. Excluding repaid claims, expenses in the latest tranche, released last week, span from a value of 10p (reconciliation for a travelcard between Euston and Coventry) to £9900 (for staffing costs in Woking constituency office).

So does the data follow Benford's law? It largely does:

 

The largest variation is a 3 percentage point difference between the expected number of leading 2s and the actual number, with most other digits being present in slightly larger quantities than expected.

Scanning through the data, it's easy to see why this is. There are a large number of claims which are made repeatedly. For instance, 18 different MPs claimed £139.26 for the same twin pack of HP toner cartridges; while nearly every claim for petrol costs came in between £10 and £19.99, boosting the 1s' count again. Conversely, there simply weren't that many must-have services which began with a 2 (although a lot of things MPs need do, apparently, cost £20 on the dot, from venue hire to cleaning bills and car parking).

None of which means there may not still be fraud in the expenses. It simply means that the actual values being claimed for have been drawn from real life. MPs are not, on the whole, making up numbers on the spot as the fill in expense forms; whether what they are claiming for ought to be paid out of the public pocket, statistics are less likely to help with.

(As an aside, it's actually surprising that the figures match Benford's law quite so well; while MP's may not be choosing the numbers they submit, the people who set the prices clearly are. That's probably the reason for the slight uptick in the 9s, for instance; a lot of things which may cost £10 instead are charged as £9.99. It seems that there are either enough counter-examples that it gets balanced out, or lots of claims for things like mileage, which have no set price)

Two data CDs, much like the ones which sparked the original expenses scandal. Photograph: Getty Images

Alex Hern is a technology reporter for the Guardian. He was formerly staff writer at the New Statesman. You should follow Alex on Twitter.

Dan Kitwood/Getty
Show Hide image

I believe only Yvette Cooper has the breadth of support to beat Jeremy Corbyn

All the recent polling suggests Andy Burnham is losing more votes than anyone else to Jeremy Corbyn, says Diana Johnson MP.

Tom Blenkinsop MP on the New Statesman website today says he is giving his second preference to Andy Burnham as he thinks that Andy has the best chance of beating Jeremy.

This is on the basis that if Yvette goes out first all her second preferences will swing behind Andy, whereas if Andy goes out first then his second preferences, due to the broad alliance he has created behind his campaign, will all or largely switch to the other male candidate, Jeremy.

Let's take a deep breath and try and think through what will be the effect of preferential voting in the Labour leadership.

First of all, it is very difficult to know how second preferences will switch. From my telephone canvassing there is some rather interesting voting going on, but I don't accept that Tom’s analysis is correct. I have certainly picked up growing support for Yvette in recent weeks.

In fact you can argue the reverse of Tom’s analysis is true – Andy has moved further away from the centre and, as a result, his pitch to those like Tom who are supporting Liz first is now narrower. As a result, Yvette is more likely to pick up those second preferences.

Stats from the Yvette For Labour team show Yvette picking up the majority of second preferences from all candidates – from the Progress wing supporting Liz to the softer left fans of Jeremy – and Andy's supporters too. Their figures show many undecideds opting for Yvette as their first preference, as well as others choosing to switch their first preference to Yvette from one of the other candidates. It's for this reason I still believe only Yvette has the breadth of support to beat Jeremy and then to go on to win in 2020.

It's interesting that Andy has not been willing to make it clear that second preferences should go to Yvette or Liz. Yvette has been very clear that she would encourage second preferences to be for Andy or Liz.

Having watched Andy on Sky's Murnaghan show this morning, he categorically states that Labour will not get beyond first base with the electorate at a general election if we are not economically credible and that fundamentally Jeremy's economic plans do not add up. So, I am unsure why Andy is so unwilling to be clear on second preferences.

All the recent polling suggests Andy is losing more votes than anyone else to Jeremy. He trails fourth in London – where a huge proportion of our electorate is based.

So I would urge Tom to reflect more widely on who is best placed to provide the strongest opposition to the Tories, appeal to the widest group of voters and reach out to the communities we need to win back. I believe that this has to be Yvette.

The Newsnight focus group a few days ago showed that Yvette is best placed to win back those former Labour voters we will need in 2020.

Labour will pay a massive price if we ignore this.

Diana Johnson is the Labour MP for Hull North.