21 July 2014updated 26 Sep 2015 8:01am

Public transport bodies: producing lots of data, not necessarily making the most of it

The trend over the last few years has been for public transport authorities to accept that their data should be made public - while at the same time letting the private sector absorb the cost of making use of it.

By Ian Steadman

There’s an app called Citymapper that is probably the one thing I would consider necessary to my day-to-day life, and the justification for owning a smartphone. The things I do on the move – listening to music, checking the web, etc. – are things I don’t need to do while moving; but knowing where I am and where the thing I am moving towards are in relation to each other is an everyday essential.

Obviously, this is what maps are for, and there are maps all over London, but it’s a big city, and coming to it for the first time can be very confusing. Transport for London has a journey planner on its website, but it often feels very unintuitive. For example, here’s what happens when you ask for help with a journey that a daytripper to the capital might take – from Paddington to Tottenham’s football ground, White Hart Lane:

Before I’d moved to London and learned the language of the tube or bus maps, and was merely a tourist, I found this format incredibly confusing. I was using the journey planner because I didn’t know how to get around the city, yet I was being given a list of places I’d never heard of and told to travel between them without any indication of where they were relative to each other. For journeys with lots of steps, or which involve buses, this format can make simple journeys seem convoluted. (And while the TfL site has had a redesign recently to clean it up, it’s still the same basic idea.)

Citymapper, though, does this:

Select and enter your email address

The Saturday Read

Your weekly guide to the best writing on ideas, politics, books and culture every Saturday. The best way to sign up for The Saturday Read is via saturdayread.substack.com

Morning Call

The New Statesman's quick and essential guide to the news and politics of the day. The best way to sign up for Morning Call is via morningcall.substack.com

The Salvo

Our Thursday ideas newsletter, delving into philosophy, criticism, and intellectual history. The best way to sign up for The Salvo is via thesalvo.substack.com

Events and Offers

Stay up to date with NS events, subscription offers & updates.

The Green Transition

Weekly analysis of the shift to a new economy from the New Statesman's Spotlight on Policy team. The best way to sign up for The Green Transition is via spotlightonpolicy.substack.com

Your email address

Visit our privacy Policy for more information about our services, how New Statesman Media Group may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications.

It’s the same information presented as a map instead of a step-by-step list. It offers text instructions as well, but the key thing is it shows the user where they are and where they’re going, and where the stops along the way are. (And, when I went to New York City earlier this year on holiday, it was better there than the official step-by-step journey planner too.)

Citymapper doesn’t have a special data source – it’s simply using the official data that these transport systems generate in a different way. TfL, like many transport authorities, has what’s known as an application programming interface – or API – which tells external apps and sites how to communicate with its servers, and how to retrieve the information they need. This isn’t just trains, but also bus departure times, bike hire docking station statuses, taxi information, fare breakdowns, and so on, and is why a search on Apple’s App Store or Google’s Play Store for “London transport” delivers dozens of results.

It’s left me wondering this: why is it that public bodies like TfL are so good at producing data, but so bad at using it? Why is an organisation like TfL – which has such a strong brand identity, and which puts so much care and attention into its tube maps, for example – happy to sit back and let third parties control how the public experiences its information?

I emailed Azmat Yusuf, Citymapper’s founder, about this, and he got back to me:

Wouldn’t say transport authorities are bad at using data, they are focused on infrastructure running on time which requires utilising their own data. We are focused on apps and we like to do that well.

Most transport authorities are opening up data, some more reluctantly than others, since it’s a cost to them. But starting to believe and understand that they are better off letting others like us develop applications. Also there’s an argument that the data belongs in the public domain anyway.”

It’s an interesting idea, that the data “belongs” in the public domain. TfL is a government body, and in a democracy that should mean that we – the public – have a right to its data. Yet we’re living through a strange time, when privatisation and outsourcing make it hard to define what counts as government and what as private company; and where the government is committed to starving TfL of subsidies so that it has to fund itself entirely out of fare revenues, so that it operates less like a public utility and more like a private firm.

Perhaps we should simply be grateful that TfL goes to the lengths it does to provide such comprehensive real-time data feeds across its jurisdiction. The situation could be worse – as I discovered when I spoke to Tom Cairns, founder of Realtime Trains.

His site uses a large range of data sources to piece together a picture of exactly how a train moves around the UK relative to its timetabled route. The problem is that those data feeds are highly reliant on human input to reflect things like cancellations and delays – and if somebody doesn’t keep updating those feeds as fast as new information comes in, the official sources, and by extension Realtime Trains, would return innacurate information.

Cairns and his two colleagues use the knowledge of how the network operates – from the lengths of trains to their top speed, to how fast signals change on certain track sections to how long it takes a train to pull away from the platform – to model the likely future arrival time of a train relative to how well it’s already done in its journey. “I have people who go around the UK noting down the timing of the wheels after they start and compare that to the signalling equipment, to get a more accurate picture of what is happening,” he told me. “There’s a lot of work that’s gone into improving the accuracy of the information by hand.”

The problems he faces, though, illustrate the difficulties small developers face when trying to work with a large institution with a reluctance to share its secrets.

When I get to train companies about getting information directly they say ‘why should we give it out to you, a tiny third party on the periphery?’ In their industry, it’s slow moving. They never had competition in this field either, and they didn’t see a problem with what they’re doing. The opening of [data] took many of us by surprise, in part because we knew they didn’t see a problem.”

Cairns also says that good ideas on third party sites, like his map of track obstructions, have a habit of popping up on the official journey planners a few months down the line. “It was added to the National Rail development roadmap,” he explained. “But of course it’s three years later and it still hasn’t appeared.” Institutional willingness countered by institutional lethargy.

TfL is due to update its API in the autumn to make it adhere to more universal data standards, and theoretically make it easier for developers to use. I emailed TfL to ask what the motivation was for letting third-party developers loose on its data – whether, for example, it was because of budget limitations – but digital PR lead Rubin Govinden told me that they don’t comment on what other people have done.

He did say this, though:

There could be eight different travel apps out there and they could all be offering their own service, but the one thing they have in common is they’re using the most accurate and up-to-date information. They don’t have to wonder ‘is this data pukka?’ – yes it is, yes it’s up-to-date, they’re all going to be saying the exact same thing at the same time in the same way. What we’re giving developers is the ingredients they need to produce the products that our digital customers want.”

When pushed for more specifics, he repeated that he didn’t want to name any individual app or site: “If it’s to work successfully, lots of people have to work together, and we’re doing our bit by making our data freely and openly available in as many formats in a recognisable format that can be used by everybody.”

This echoes Yusuf’s views – TfL’s focused on the bit of its job that involves running the trains on time, and the data bit is the afterthought. It’s perhaps one model that other public bodies – and not just train companies – could look to when figuring out just how much they want to commit to developing their own tools to make use of the data they produce.