Brewster Kahle eats petabytes for breakfast. Back when the founders of Google were still doing short division, the Silicon Valley egghead had already sold his first search engine to AOL for $15m. Then in 1999 Kahle’s next project, Alexa Internet, which maps the behaviour and search patterns of web users, was bought by Amazon for $250m.
Like other successful net entrepreneurs, Kahle has ploughed his spoils into a non-profit endeavour, and the result is the Internet Archive, an attempt to achieve what the ancient Greeks and Egyptians tried at the library of Alexandria: to make a permanent record of all human knowledge. But this time around the library will be on the internet, universally accessible and – crucially – flameproof.
Cataloguing the World Wide Web alone is a daunting task. For a start, nobody knows how big it is. And because pages with the same reference change over time, that multiplies the challenge. Nevertheless, Kahle’s Wayback Machine (available at http://www.archive.org/web/web.php) already allows users to browse more than 40 billion web pages archived from 1996 onwards – exposing, among other things, some embarrassing design decisions that certain print magazines made in their early days online (see http://web.archive.org/web/19981111190118/).
But Kahle wants more. “The problem with the internet is not that there’s too much information out there, it’s that there’s not enough good stuff. We’re looking to radically increase the amount of material on the web.”
Yes, Kahle’s goal is universal access to all knowledge, and so the Internet Archive aims to make every book ever written available over the web. “The ancient library of Alexandria collected 75 per cent of all the books of all the peoples of the world in 300BC. Our opportunity is to do that again, but then to one-up the ancients by making it available universally. It is technologically within our grasp and it could be one of the greatest achievements of humankind.”
In building this library, Kahle has had to sue his own country. The case, which is awaiting a hearing date, argues that copyright regulations enacted in anticipation of the digital age are contrary to the First Amendment. He explains: “I think it will come as a surprise to most people that the library system of physical books we grew up with has been made risky, if not illegal, in the digital world. This makes no sense. We needed some clarification as to what we as a library were allowed to do with, for example, out-of-print and orphan works, things that are traditionally found on the shelves of libraries. In the US, the way you ask a question like that is you file a lawsuit.”
By chance, I catch him on the morning he has announced the Open Content Alliance, a partnership between the Internet Archive, the search engine Yahoo! and the University of California to digitise historical works of fiction. The announcement comes nine months after Google launched a similar, huge-scale digitisation project. The two projects differ in the way they propose to offer the digital books once they have scanned them. Whereas Kahle’s team would make full-text, searchable copies available to all users of the web, Google will allow access only from the libraries in which the original books are stored, and as extracts through its commercial service Google Print.
When I ask Kahle how he feels about Google’s project, he takes a deep breath. “We applaud the enthusiasm of Google to make steps in this area and digitise materials. But we would rather see an open system applied for the open content arena, in the same tradition as the open networks that brought us the internet. That fits much better with civic institutions like libraries.”
Google’s proposal has earned it a class-action lawsuit from the Authors Guild, but because the Open Content initiative is starting on out-of-copyright (or “public domain”) material, it will avoid the same fate.
However, like any librarian, Brewster Kahle wants his collection to be complete. The move to copyright works is inevitable, but by that time, Kahle feels, commercial interests will have moved out of the courtroom and into the boardroom. “The internet evolved in the early 1990s with public works, and then by 1994/95 there was enough of a user base that we could start to get commercial entities involved. If we get the technology right based on the public domain, I think the same thing will happen with books.”
Put simply, build the library and the books will come. If Kahle is right, the Internet Archive will reinvent the way we use the web – fulfilling its destiny as the ultimate, searchable record of all human knowledge.