The Hansard digitisation project

A handful of web developers are making 200 years' worth of parliamentary sessions available online a

Sign Up

Get the New Statesman's Morning Call email.

There may yet be hope for government projects, if Hansard 1804 - 2004 is anything to go by.

It's hard to read the phrase 'government IT projects' in the news without them usually being followed by words like 'over-budget', 'delays' or 'expensive consultants'. But look hard enough, and every now and then you see something really interesting being done with comparatively tiny resources. The Hansard digitisation project is one such example.

Hansard is the official transcript of all the sessions that take place in parliament, recording the words of the MP's, Lords and civil servants, as they are said, but until relatively recently, it was recorded by hand, or typed onto hard copy. Earlier this year, a process began to remove every page of Hansard from its bindings, then scan and digitise content, to preserve it, and also make it universally accessible online.

What's really interesting though, is how the project is being run after the initial digitising takes place. There's a comparatively tiny core team of three people working on the website, which is being run as a public beta available for the whole world to see, as they hack away on the website in the open, and the developers are using free, open source web development tools, instead of expensive turn-key proprietary systems. These are the tools and methods used by bootstrapping start-ups, not big government IT departments; to see how a similar project is run, we need only turn our eyes to the United Nations website.

The goal is the same - greater transparency, by making transcripts and documents related to meetings visible to users - but the execution of the idea is totally different. On the UN site the content is designed almost exclusively for a person using a modern web browser on a full size computer, to manually click through the various layers of the website, as you're bounced from site to site, before you can download the transcripts as pdf files. This approach makes it very hard to access and manipulate the content in any interesting and meaningful way other than follow the exact use case outlined above, and this makes designing a website to present this information in novel ways (or even link directly to it) much more complex.

On the Hansard project, by comparison, the focus is overwhelmingly on the content, and making it as easy as possible to get to it in as many ways as posisble. Everything is designed from the ground up to be indexed by search engines, or linked directly from elsewhere on the net. If I want link to the eerily polite declaration of war delivered to Germany by Britain in 1939, I can point right at it the text. Similarly, if I want to link to the time the late, acid sharp tongue Tony Banks described the Conservative party conference as "an event where anyone who's capable of walking, talking and farting at the same gets a standing ovation, I can point directly to it without much fuss.

Making this act of accessing very specific content directly is extremely important. This granular kind of access can stop our own history being something confined to hidebound textbooks gathering dust in libraries, and make it into something live and accessible; something simple to share with others and use as a tool to help provide perspective and understanding of how governments work.

When tools like this can be made with such small teams at a relatively low cost, and delivered in a manner that makes the most of the medium of the web, it becomes a lot harder to justify the astronomical costs and overruns of so many government projects run by large consultancies. Well, we can always hope...