Predicting the text in redacted documents is close to reality

Releasing delicate information with big black bars all over it has kept secrets safe for years - but not for much longer, maybe.

For those with secrets they want to keep, redacting documents is a pretty important thing to get right. It’s necessary to understand how to redact documents, firstly - look to Southwark Council, which in February uploaded its controversial agreement with developer Lend Lease for the regeneration of the Heygate Estate in a form that let people copy and paste the text underneath the black bars.

But it’s also necessary to know which parts of a document to redact so that the context from the stuff left open doesn’t give the game away. There is always, however, information left behind. The choices made in how to block text - be it with other bits of paper, or black marker pen, or even by typing out new words and then covering those up - can reveal something about the person doing the redacting. Different agencies had different redaction standards at different times, which gives a further clue as to what technique is needed. Each typeface fits into the space under a bar in a limited number of contextually-relevant ways, as well.

In the New Yorker, William Brennan reports on The Declassification Engine, an intriguing attempt by a group of academics to use these clues to try and crack any redacted text. A snippet:

Together with a group of historians, computer scientists, and statisticians, [Columbia history professor Matthew] Connelly is developing an ambitious project called the Declassification Engine, which, among other things, employs machine-learning and natural language processing to study the semantic patterns in declassified text. The project’s goals range from compiling the largest digitized archive of declassified documents in the world to plotting the declassified geographical metadata of over a million State Department cables on an interactive global map, which the researchers hope will afford them new insight into the workings of government secrecy. Though the Declassification Engine is in its early stages, Connelly told me that the project has “gotten to the point where we can see it might be possible to predict content of redacted text. But we haven’t yet made a decision as to whether we want to do that or not.”

One of the things that jumps out in here is the parallel between the "mosaic theory" - where "pieces of banal, declassified information, when pieced together, might provide a knowledgeable reader with enough emergent detail to uncover the information that remains classified" - and critics of the NSA who realise that mass collection of metadata rather than the actual data of communications is, in many ways, just as bad.

Redacted Iraq War info at a 2004 US Senate press conference (Photo: Getty)

Ian Steadman is a staff science and technology writer at the New Statesman. He is on Twitter as @iansteadman.

YouTube screengrab
Show Hide image

Why I’m sick of fake theorists lamenting the “millennial problem”

Wise Thinkers lament smartphones, social media, and self-entitlement – ignoring how badly off this generation is thanks to its predecessors.

There is a certain sort of Wise Thinker who loves nothing more than to offer advice on the “problem” of “millennials”. Oh, Wise Thinker, where has this mysterious generation of lazy, entitled narcissists come from, and how am I supposed to deal with the ones who keep showing up in my office?

The answer, we’re told, is a massive failure in parenting that started in the 1980s – suddenly children were told they were special, that they could do anything they wanted to. Worse, they were shown they didn’t have to work for it – they were given participation medals just for showing up, and any time they did badly at school, they didn’t need to improve; their parents just complained to get them better marks!

No evidence that any of this is substantially true (or caused the claimed effects) need be offered: that can be left as an exercise to the reader’s own preconceptions.

(They’ve given out participation medals in the modern Olympics since it started in 1896, by the way. No one ever seems to mention that.)

A particularly refined example of this sort of thing has been doing the rounds of social media recently – a video clip in which motivational speaker and TED talkist Simon Sinek rehearses the familiar lines but then makes a rather bolder claim: millennials are losing the capacity for joy (and some of them are even killing themselves), and it’s all because of mobile phones.

Their use of mobile phones and social media is addictive, Sinek says, in exactly the same way as drugs and alcohol. He refers to the brain chemical dopamine, which immediately turns his every utterance into rigorous neuroscience – regardless of the quantity and quality of the evidence available to support it.

That every millennial is suffering from this terrible addiction is taken as read, as much as everyone who’s ever had a glass of wine is a raging alcoholic. Non-millennials, we all know, completely eschew the mobile phone and have never been seen on Facebook.

But this is only part of the broader millennial addiction to instant gratification – same-day delivery, movies-on-demand, even getting a date is now as simple as swiping right, as anyone who’s never actually tried online dating will surely agree!

It seems all millennials can have everything they want, whenever they want it, so they will never learn the hard lessons that the Wise Thinkers learned in the old times: how to be patient, how to have self-restraint, how to work hard for something.

This can surely be the first time in history in which the old have considered the young to be impatient and lazy.

Worst-case scenario? Sinek points to a rise in depression and suicide, and lets us draw arbitrary lines as we please. His best-case scenario: the millennial will never learn how to find joy, unless, apparently, their benevolent employer helps them with such innovative solutions as banning phones in meetings. Sure.

There is of course nothing wrong with some scepticism towards new technology and the effect it can have on the fragile human mind. If only we had heeded the scientist Conrad Gessner’s dire warning of a powerful new invention that would overwhelm, confuse and ultimately harm us with its unstoppable flood of information. That invention? The book. Gessner lived through the invention of the printing press in the sixteenth century. History doesn’t record whether or not he wore stupid glasses.

But maybe Sinek is right – maybe only by abandoning the embrace of Siri will you know true love, millennials, some of you who are actually in your mid-thirties these days and have probably already started tutting at those younger than you who never learned “real” patience by sending texts on a Nokia 3310.

It must be a lot of fun, theorising about the possible origins of the “millennial problem”, and coming up with brilliant outside-the-box solutions to it. Weird, though, that all these Wise Thinkers never seem to talk about how many millennials started their careers in the midst (or the aftermath) of an uncertain job market caused by the 2008 financial crisis. Or how many of them had to start their careers with unpaid internships. Or, more fundamentally, that they’re the first generation for decades to earn lower wages than their predecessors.

Perhaps, for some strange reason, managers so supposedly desperate to understand millennial employees are not quite as interested in paying motivational speakers to tell them about things like that.