Support 100 years of independent journalism.

  1. Science & Tech
16 May 2018updated 09 Sep 2021 4:08pm

The “Yanny or Laurel” debate reveals more about our brains than our ears

The video clip circulating the web is the audio equivalent of an optical illusion. 

By Suzy J Styles

Yanny… or laurel? This is the question circulating the web, after a video clip emerged of a strange word being vocalised. Some listeners heard the first word, others the second. So should you be distrusting your own hearing? 

The human voice is really a fancy instrument. We create sounds in the throat by pushing air from the lungs past the vocal cords. These vibrations then echo up and out of the mouth, through a bent pipe (the vocal tract). Whenever we change the shape of this pipe, it changes the way the vibrations echo. 

You can see the anatomy at work in a video showing a real-time MRI of someone speaking German. The shape changes can be measured in energy bands called formants (which are noted in the style F1, F2, F3). So speech contains information about the position and motion of all three parts of the pipe. The evidence is right there in the sound waves – if you know where to look.

Image credit: Suzy Styles

Here’s a spectrogram showing the energy in the Yanny/Laurel clip first shared by Reddit user RolandCamry. Let’s look at the vowels: We expect to see just three dark bands in the lower part (F1, F2, F3), but there is a jumble of overlapping lines.

Sign up for The New Statesman’s newsletters Tick the boxes of the newsletters you would like to receive. A weekly newsletter helping you fit together the pieces of the global economic slowdown. Quick and essential guide to domestic and global politics from the New Statesman's politics team. The New Statesman’s global affairs newsletter, every Monday and Friday. The best of the New Statesman, delivered to your inbox every weekday morning. The New Statesman’s weekly environment email on the politics, business and culture of the climate and nature crises - in your inbox every Thursday. Our weekly culture newsletter – from books and art to pop culture and memes – sent every Friday. A weekly round-up of some of the best articles featured in the most recent issue of the New Statesman, sent each Saturday. A newsletter showcasing the finest writing from the ideas section and the NS archive, covering political ideas, philosophy, criticism and intellectual history - sent every Wednesday. Sign up to receive information regarding NS events, subscription offers & product updates.

In fact, this mess holds the clue to why we are hearing different things. In red, are three long wavy lines, with a big gap between the first and the second. Their location and shape is just what we would expect if the speech was mostly vowels, and the particular vowels were “ee” “ah” “ee” as in Yanny.

In blue, the lines are short, and there are three dark bars just in the middle. This is what we would expect if there was an “oh” vowel in the middle, surrounded by quieter consonants like “l” and “r”, as in Laurel. So the audio contains both signals, but the information has to be combined in different ways.

Content from our partners
“I learn something new on every trip"
How data can help revive our high streets in the age of online shopping
Why digital inclusion is a vital piece of levelling up

Image credit: Suzy Styles

In short, the video clip is the audio equivalent of one of those line drawings that is both a face and a vase, but can’t be both at the same time. The signal is a bit vague in places, which helps this audio works its magic: the human brain is extremely good at filling in the missing information, so it reconstructs the parts of the speech it can’t hear accurately.

We can also see that the Yanny pattern includes more energy at high frequencies, and the Laurel pattern is stronger at low frequencies. This is probably why, when people switch devices, they sometimes hear the other name, since each device will “perform” the frequencies differently. Importantly, each time the audio is heard, the brain has to decide which is the most reliable part of the signal – which “cues” to follow. This could be why some people simply can’t hear the other name – their brain prefers a one set of cues over the other. When people swap, their brain has switched cues. So relax #TeamLaurel, it doesn’t mean you are losing your hearing for high-notes just yet – it just means your brain has decided that the consonants are more reliable. And #TeamYanny, your brain favours the vowels.

It might be possible for us to hear different things, but it isn’t possible for a human speaker to produce this pattern of noises – our speech-pipes simply can’t be in two places at once. We might be tempted to think that this audio clip was designed to trick us (I initially did). However, it seems that it might just be the crummy speech synthesis on this online dictionary. This in turn is most likely the product of a speech algorithm, which combines sounds in a way that makes sense to a computer, but can’t actually be done by a human throat. So there you have it – speech is a lot more complicated than you think. It’s no surprise that human babies take as long as they do to learn it. Just remember that, next time you are agonising over whether you’re the only one on Team Yanny. 

Suzy J Styles is an assistant professor at Nanyang Technological University, Singapore