Predicting the text in redacted documents is close to reality

Releasing delicate information with big black bars all over it has kept secrets safe for years - but not for much longer, maybe.

For those with secrets they want to keep, redacting documents is a pretty important thing to get right. It’s necessary to understand how to redact documents, firstly - look to Southwark Council, which in February uploaded its controversial agreement with developer Lend Lease for the regeneration of the Heygate Estate in a form that let people copy and paste the text underneath the black bars.

But it’s also necessary to know which parts of a document to redact so that the context from the stuff left open doesn’t give the game away. There is always, however, information left behind. The choices made in how to block text - be it with other bits of paper, or black marker pen, or even by typing out new words and then covering those up - can reveal something about the person doing the redacting. Different agencies had different redaction standards at different times, which gives a further clue as to what technique is needed. Each typeface fits into the space under a bar in a limited number of contextually-relevant ways, as well.

In the New Yorker, William Brennan reports on The Declassification Engine, an intriguing attempt by a group of academics to use these clues to try and crack any redacted text. A snippet:

Together with a group of historians, computer scientists, and statisticians, [Columbia history professor Matthew] Connelly is developing an ambitious project called the Declassification Engine, which, among other things, employs machine-learning and natural language processing to study the semantic patterns in declassified text. The project’s goals range from compiling the largest digitized archive of declassified documents in the world to plotting the declassified geographical metadata of over a million State Department cables on an interactive global map, which the researchers hope will afford them new insight into the workings of government secrecy. Though the Declassification Engine is in its early stages, Connelly told me that the project has “gotten to the point where we can see it might be possible to predict content of redacted text. But we haven’t yet made a decision as to whether we want to do that or not.”

One of the things that jumps out in here is the parallel between the "mosaic theory" - where "pieces of banal, declassified information, when pieced together, might provide a knowledgeable reader with enough emergent detail to uncover the information that remains classified" - and critics of the NSA who realise that mass collection of metadata rather than the actual data of communications is, in many ways, just as bad.

Redacted Iraq War info at a 2004 US Senate press conference (Photo: Getty)

Ian Steadman is a staff science and technology writer at the New Statesman. He is on Twitter as @iansteadman.

Show Hide image

Not just a one-quack mind: ducks are capable of abstract thought

Newborn ducklings can differentiate between objects that are the same and objects that are different, causing scientists to rethink the place of abstract thinking.

There’s a particular loftiness to abstract thought. British philosopher and leading Enlightenment thinker John Locke asserted that “brutes abstract not” – by which he meant anything which doesn’t fall under the supreme-all-mighty-greater-than-everything category of Homo sapiens was most probably unequipped to deal with the headiness and complexities of abstract thinking.

Intelligence parameters tail-ended by “bird-brained” or “Einstein” tend to place the ability to think in abstract ways at the Einstein end of the spectrum. However, in light of some recent research coming out of the University of Oxford, it seems that the cognitive abilities of our feathery counterparts have been underestimated.

In a study published in Science, led by Alex Kacelnik – a professor of behavioural psychology – a group of ducklings demonstrated the ability to think abstractly within hours of being hatched, distinguishing the concepts of “same” and “different” with success.

Young ducklings generally become accustomed to their mother’s features via a process called imprinting – a learning mechanism that helps them identify the individual traits of their mothers. Kacelnik said: “Adult female ducks look very similar to each other, so recognising one’s mother is very difficult. Ducklings see their mothers from different angles, distances, light conditions, etc, so their brains use every possible source of information to avoid errors, and abstracting some properties helps in this job.”

It’s this hypothesised abstracting of some properties that led Kacelnik to believe that there must be more going on with the ducklings beyond their imprinting of sensory inputs such as shapes, colours or sounds.

The ability to differentiate the same from the different has previously been used as means to reveal the brain’s capacity to deal with abstract properties, and has been shown in other birds and mammals, such as parrots, pigeons, bees and monkeys. For the most part, these animals were trained, given guidance on how to determine sameness and differences between objects.

What makes Kacelnik’s ducklings special then, as the research showed, was that they were given no training at all in learning the relations between objects which are the same and object which are different.

“Other animals can be trained to respond to abstract relations such as same or different, but not after a single exposure and without reinforcement,” said Kacelnik.

Along with his fellow researcher Antone Martinho III, Kacelnik hatched and domesticated mallard ducklings and then threw them straight into an experiment. The ducklings were presented pairs of objects – either identical or different in shape or colour – to see whether they could find links and relations between the pairs.

The initial pairs they were presented served as the imprinting ones; it would be the characteristics of these pairs which the ducklings would first learn. The initial pairs involved red cones and red cylinders which the ducklings were left to observe and assimilate into their minds for 25 minutes. They were then exposed to a range of different pairs of objects: red pyramid and red pyramid, red cylinder and red cube.

What Kacelnik and his research partner found was that the ducklings weren’t imprinting the individual features of the objects but the relations between them; it’s why of the 76 ducklings that were experimented with, 68 per cent tended to move towards the new pairs which were identical to the very first pairs they were exposed to.

Put simply, if they initially imprinted an identical pair of objects, they were more likely to favour a second pair of identical objects, but if they initially imprinted a pair of objects that were different, they would favour a second pair of differing objects similar to the first.

The results from the experiment seem to highlight a misunderstanding of the advanced nature of this type of conceptual thought process. As science journalist Ed Yong suggests, there could be, “different levels of abstract concepts, from simple ones that young birds can quickly learn after limited experience, to complex ones that adult birds can cope with”.

Though the research doesn’t in any way assume or point towards intelligence in ducklings to rival that of humans, it seems that the growth in scientific literature on the topic continues to refute the notions that human being as somehow superior. Kacelnik told me: “The last few decades of comparative cognition research have destroyed many claims about human uniqueness and this trend is likely to continue.”