Unknown unknowns

In his first Reith lecture, Martin Rees discussed the "scientific citizen". How we proceed in areas such as genetics, brain science and artificial intelligence ought to involve the views of the public, he said. And that means they need to be able to make informed choices.

Rees argued that this includes giving people a sense of how confident we can be in science's claims; the public needs to know about what Donald Rumsfeld would term the "known unknowns". But Rees didn't talk about the unknown unknowns.

At this year's meeting of the American Association for the Advancement of Science, a panel of statisticians demonstrated that many scientific findings are much weaker than even scientists realise. Reruns of published health studies, for example, get the same finding as the original only 5 per cent of the time.

Then there are the results that are clearly wrong. "Statistically significant" studies have shown, for instance, that breathing in pollution reduces heart problems, and that living in a polluted city is as harmful as smoking 40 cigarettes a day. Somehow these results were peer-reviewed and published, yet common sense tells us they are implausible.

Yet often the studies are too complex for common sense to penetrate the data - and here is where things go awry. Usually, scientists use statistics to decide whether there is less than a 5 per cent chance that their apparently significant result could be the result of random effects. But the small print says this depends on their doing a simple, single test that is decided upon in advance. If you test many things at once, or make decisions on analysis after looking at the data, the probability of a mistaken conclusion goes through the roof.

A study published in 2008 in the Journal of the American Medical Association showcases a typical pratfall.

The study analysed the results of exposure to 275 chemicals, looking at 32 possible resulting health problems. It found that exposure to BPA, a chemical used in the manufacture of plastics, could result in diabetes and liver and heart problems. But the result is meaningless. In total, the study involved nearly 9,000 different tests and offered nine million different ways to analyse the data. Any of its findings could reasonably be the result of chance.

Statistically speaking, the magic number is 60: that's how many angles you can examine the data from before it becomes 95 per cent likely that you'll get an apparently significant result at random. With big science - from astronomy to genome sequencing - giving tens of thousands of data points in a single set, it would be a miracle if scientists found nothing in their experiments these days.