Predicting the text in redacted documents is close to reality

Releasing delicate information with big black bars all over it has kept secrets safe for years - but not for much longer, maybe.

For those with secrets they want to keep, redacting documents is a pretty important thing to get right. It’s necessary to understand how to redact documents, firstly - look to Southwark Council, which in February uploaded its controversial agreement with developer Lend Lease for the regeneration of the Heygate Estate in a form that let people copy and paste the text underneath the black bars.

But it’s also necessary to know which parts of a document to redact so that the context from the stuff left open doesn’t give the game away. There is always, however, information left behind. The choices made in how to block text - be it with other bits of paper, or black marker pen, or even by typing out new words and then covering those up - can reveal something about the person doing the redacting. Different agencies had different redaction standards at different times, which gives a further clue as to what technique is needed. Each typeface fits into the space under a bar in a limited number of contextually-relevant ways, as well.

In the New Yorker, William Brennan reports on The Declassification Engine, an intriguing attempt by a group of academics to use these clues to try and crack any redacted text. A snippet:

Together with a group of historians, computer scientists, and statisticians, [Columbia history professor Matthew] Connelly is developing an ambitious project called the Declassification Engine, which, among other things, employs machine-learning and natural language processing to study the semantic patterns in declassified text. The project’s goals range from compiling the largest digitized archive of declassified documents in the world to plotting the declassified geographical metadata of over a million State Department cables on an interactive global map, which the researchers hope will afford them new insight into the workings of government secrecy. Though the Declassification Engine is in its early stages, Connelly told me that the project has “gotten to the point where we can see it might be possible to predict content of redacted text. But we haven’t yet made a decision as to whether we want to do that or not.”

One of the things that jumps out in here is the parallel between the "mosaic theory" - where "pieces of banal, declassified information, when pieced together, might provide a knowledgeable reader with enough emergent detail to uncover the information that remains classified" - and critics of the NSA who realise that mass collection of metadata rather than the actual data of communications is, in many ways, just as bad.

Redacted Iraq War info at a 2004 US Senate press conference (Photo: Getty)

Ian Steadman is a staff science and technology writer at the New Statesman. He is on Twitter as @iansteadman.

Collage by New Statesman
Show Hide image

Clickbaiting terror: what it’s like to write viral news after a tragedy

Does the viral news cycle callously capitalise on terrorism, or is it allowing a different audience to access important news and facts?

On a normal day, Alex* will write anywhere between five to ten articles. As a content creator for a large viral news site, they [Alex is speaking under the condition of strict anonymity, meaning their gender will remain unidentified] will churn out multiple 500-word stories on adorable animals, optical illusions, and sex. “People always want to read about sexuality, numbers of sexual partners, porn habits and orgasms,” says Alex. “What is important is making the content easily-digestible and engaging.”

Alex is so proficient at knowing which articles will perform well that they frequently “seek stories that fit a certain template”. Though the word “clickbait” conjures up images of cute cat capers, Alex says political stories that “pander to prejudices” generate a large number of page views for the site. Many viral writers know how to tap into such stories so their takes are shared widely – which explains the remarkably similar headlines atop many internet articles. “This will restore your faith in humanity,” could be one; “This one weird trick will change your life…” another. The most cliché example of this is now so widely mocked that it has fallen out of favour:

You’ll never believe what happened next.

When the world stops because of a tragedy, viral newsrooms don’t. After a terrorist attack such as this week’s Manchester Arena bombing, internet media sites do away with their usual stories. One day, their homepages will be filled with traditional clickbait (“Mum Sickened After Discovery Inside Her Daughter’s Easter Egg”, “This Man’s Blackhead Removal Technique Is A Complete And Utter Gamechanger”) and the next, their clickbait has taken a remarkably more tragic tone (“New Footage Shows Moment Explosion Took Place Inside Manchester Arena”, “Nicki Minaj, Rihanna, Bruno Mars and More React to the Manchester Bombing”).

“When a terrorist event occurs, there’s an initial vacuum for viral news,” explains Alex. Instead of getting reporters on the scene or ringing press officers like a traditional newsroom, Alex says viral news is “conversation-driven” – meaning much of it regurgitates what is said on social media. This can lead to false stories spreading. On Tuesday, multiple viral outlets reported – based on Facebook posts and tweets – that over 50 accompanied children had been led to a nearby Holiday Inn. When BuzzFeed attempted to verify this, a spokesperson for the hotel chain denied the claim.

Yet BuzzFeed is the perfect proof that viral news and serious news can coexist under the same roof. Originally famed for its clickable content, the website is now home to a serious and prominent team of investigative journalists. Yet the site has different journalists on different beats, so that someone writes about politics and someone else about lifestyle or food.

Other organisations have a different approach. Sam* works at another large viral site (not Buzzfeed) where they are responsible for writing across topics; they explains how this works:  

“One minute you're doing something about a tweet a footballer did, the next it's the trailer for a new movie, and then bam, there's a general election being called and you have to jump on it,” they say.

Yet Sam is confident that they cover tragedy correctly. Though they feel viral news previously used to disingenuously “profiteer” off terrorism with loosely related image posts, they say their current outlet works hard to cover tragic news. “It’s not a race to generate traffic,” they say, “We won't post content that we think would generate traffic while people are grieving and in a state of shock, and we're not going to clickbait the headlines to try and manipulate it into that for obvious reasons.”

Sam goes as far as to say that their viral site in fact has higher editorial standards than “some of the big papers”. Those who might find themselves disturbed to see today’s explosions alongside yesterday’s cats will do well to remember that “traditional” journalists do not always have a great reputation for covering tragedy.

At 12pm on Tuesday, Daniel Hett tweeted that over 50 journalists had contacted him since he had posted on the site that his brother, Martyn, was missing after the Manchester attack. Hett claimed two journalists had found his personal mobile phone number, and he uploaded an image of a note a Telegraph reporter had posted through his letterbox. “This cunt found my house. I still don't know if my brother is alive,” read the accompanying caption. Tragically it turned out that Martyn was among the bomber's victims.

Long-established newspapers and magazines can clearly behave just as poorly as any newly formed media company. But although they might not always follow the rules, traditional newspapers do have them. Many writers for viral news sites have no formal ethical or journalistic training, with little guidance provided by their companies, which can cause problems when tragic news breaks.

It remains to be seen whether self-policing will be enough. Though false news has been spread, many of this week’s terror-focused viral news stories do shed light on missing people or raise awareness of how people can donate blood. Many viral news sites also have gigantic Facebook followings that far outstrip those of daily newspapers – meaning they can reach more people. In this way, Sam feels their work is important. Alex, however, is less optimistic.

“My personal view is that viral news does very little to inform people at times like this and that trending reporters probably end up feeling very small about their jobs,” says Alex. “You feel limited by the scope of your flippant style and by what the public is interested in.

“You can end up feeding the most divisive impulses of an angry public if you aren’t careful about what conversations you’re prompting. People switch onto the news around events like this and traffic rises, but ironically it’s probably when trending reporters go most into their shells and into well-worn story formats. It’s not really our time or place, and to try and make it so feels childish.”

Amelia Tait is a technology and digital culture writer at the New Statesman.

0800 7318496