Now you see it, now you don't: what optical illusions tell us about our brains

Illusions can offer insights into how the visual system processes images.

Maurits Escher: where do the staircases lead?

The human brain is a network of about 20 billion neurons – nerve cells – linked by several trillion connections. Not to mention glial cells, which scientists used to think were inactive scaffolding, but increasingly view as an essential part of how the brain works. Our brains give us movement, language, senses, memories, consciousness and personality. We know a lot more about the brain than we used to, but it still seems far too complicated for human understanding.

Fortunately, the brain contains many small networks of neurons that carry out some specific function: vision, hearing, movement. It makes sense to tackle these simple modules first. Moreover, we have good mathematical models of nerve cell behaviour. In 1952, Alan Hodgkin and Andrew Huxley wrote down the “Hodgkin-Huxley equations” for the transmission of a nerve impulse, which won them the 1963 Nobel Prize in Medicine. We also have effective techniques for understanding small networks’ components and how they are linked.

Many of these simple networks occur in the visual system. We used to think that the eye was like a camera, taking a “snapshot” of the outside world that was stored in the brain like a photo stuck in an album. It uses a lens to focus an image on to the retina at the back of the eye, which functions a bit like a roll of film – or, in today’s digital cameras, a charge-coupled device, storing an image pixel by pixel. But we now know that when the retina sends information to the brain’s visual cortex, the similarity to a camera ends.

Although we get a strong impression that what we are seeing is “out there” in front of us, what determines that perception resides inside our own heads. The brain decomposes images into simple pieces, works out what they are, “labels” them with that information, and reassembles them. When we see three sheep and two pigs in a field, we “know” which bits are sheep, which are pigs, and how many of each there are. If you try to program a computer to do that, you quickly realise how tricky the process is. Only very recently have computers been able to distinguish between faces, let alone sheep and pigs.

Probing the brain’s detailed activity is difficult. Rapid progress is being made, but it still takes a huge effort to get reliable information. But when science cannot observe something directly, it infers it, working indirectly. An effective way to infer how something functions is to see what it does when it goes wrong. It may be hard to understand a bridge while it stays up, but you can learn a lot about strength of materials when it collapses.

The visual system can “go wrong” in several interesting ways. Hallucinogenic drugs can change how neurons behave, producing dramatic images such as spinning spirals, which originate not in the eye, but in the brain. Some images even cause the brain to misinterpret what it’s seeing without outside help. We call them optical illusions.

One of the earliest was discovered in Renaissance Italy in the 16th century. Giambattista della Porta was the middle of three surviving sons of a wealthy merchant nobleman who became secretary to the Holy Roman emperor Charles V. The father was an intellectual, and Giambattista grew up in a house in Naples that hosted innumerable mathematicians, scientists, poets and musicians. He became an outstanding polymath, with publications on secret codes (including writing on the inside of eggshells), physiology, botany, agriculture, engineering, and much else. He wrote more than 20 plays.

Della Porta was particularly interested in the science of light. He made definitive improvements to the camera obscura, a device that projects an image of the outside world into a darkened room; he claimed to have invented the telescope before Galileo, and very likely did. His De refractione optices of 1593 contained the first report of a curious optical effect. He arranged two books so that one was visible to one eye only and the other to the other eye. Instead of seeing a combination of the two images, he perceived them alternately. He discovered that he could select either image at will by consciously switching his attention. This phenomenon is known today as binocular rivalry.

Two other distinct but related effects are impossible figures and visual illusions. In rivalry, each image appears unambiguous, but the eyes are shown conflicting images. In the other two phenomena, both eyes see the same image, but in one case it doesn’t make sense, and in the other it makes sense but is ambiguous.

Impossible figures at first sight seem to be entirely normal, but depict things that cannot exist – such as Roger Shepard’s 1990 drawing of an elephant in which everything above the knees makes sense, and everything below the knees makes sense, but the two regions do not fit together correctly. The Dutch artist Maurits Escher made frequent use of this kind of visual quirk.

In 1832, the Swiss crystallographer Louis Necker invented his “Necker cube” illusion, a skeletal cube that seems to switch its orientation repeatedly. An 1892 issue of the humorous German magazine Fliegende Blätter contains a picture with the caption “Which animals are most like each other?” and the answer “Rabbit and duck”. In a 1915 issue of the American magazine Puck, the cartoonist Ely William Hill published “My wife and my mother-in-law”, based on an 1888 German postcard. The image can be seen either as a young lady looking back over her shoulder, or as an elderly woman facing forwards. Several of Salvador Dalí’s paintings include illusions; especially Slave Market With the Apparition of the Invisible Bust of Voltaire, where a number of figures and everyday objects, carefully arranged, combine to give the impression of the French writer’s face.

Illusions offer insights into how the visual system processes images. The first few stages are fairly well understood. The top layer in the visual cortex detects edges of objects and the direction in which they are pointing. This information is passed to lower layers, which detect places where the direction suddenly changes, such as corners. Eventually some region in the cortex detects that you are looking at a human face and that it belongs to Aunt Matilda. Other parts of the brain are alerted, and you belatedly remember that tomorrow is her birthday and hurry off to buy a present.

These things don’t happen by magic. They have a very definite rationale, and that’s where the mathematics comes in. The top layer of the visual cortex contains innumerable tiny stacks of nerve cells. Each stack is like a pile of pancakes, and each pancake is a network of neurons that is sensitive to edges that point in one specific direction: one o’clock, two o’clock and so on.

For simplicity, call this network a cell; it does no harm to think of it as a single neuron. Roughly speaking, the cell at the top of the stack senses edges at the one o’clock position, the next one down corresponds to the two o’clock angle, and so on. If one cell receives a suitable input signal, it “fires”, telling all the other cells in its stack: “I’ve seen a boundary in the five o’clock direction.” However, another cell in the same stack might disagree, claiming the direction is at seven o’clock. How to resolve this conflict?

Neurons are linked by two kinds of connection, excitatory and inhibitory. If a neuron activates an excitatory connection, those at the other end of it are more likely to fire themselves. An inhibitory connection makes them less likely to fire. The cortex uses inhibitory connections to reach a definite decision. When a cell fires, it sends inhibitory signals to all of the other cells in its stack. These signals compete for attention. If the five o’clock signal is stronger than the seven o’clock one, for instance, the seven o’clock one gets shut down. The cells in effect “vote” on which direction they are detecting and the winner takes all.

Many neuroscientists think that something very similar is going on in visual illusions and rivalry. Think of the duck and rabbit with two possible interpretations. Hugh R Wilson, a neuroscientist at the Centre for Vision Research at York University, Toronto, proposed the simplest model, one stack with just two cells. Rodica Curtu, a mathematician at the University of Iowa, John Rinzel, a biomathematician then at the National Institutes of Health, and several other scientists have analysed this model in more detail. The basic idea is that one cell fires if the picture looks like a duck, the other if it resembles a rabbit. Because of the inhibitory connections, the winner should take all. Except that, in this illusion, it doesn’t quite work, because the two choices are equally plausible. That’s what makes it an illusion. So both cells want to fire. But they can’t, because of those inhibitory connections. Yet neither can they both remain quiescent, because the incoming signals encourage them to fire.

One possibility is that random signals coming from elsewhere in the brain might introduce a bias of perception, so that one cell still wins. However, the mathematical model predicts that, even without such bias, the signals in both cells should oscillate from active to inactive and back again, each becoming active when the other is not. It’s as if the network is dithering: the two cells take turns to fire and the network perceives the image as a duck, then as a rabbit, and keeps switching from one to the other. Which is what happens in reality.

Generalising from this observation, Wilson proposed a similar type of network that can model decision-making in the brain – which political party to support, for instance. But now the network consists of several stacks. Maybe one stack represents immigration policy, another unemployment, a third financial regulation, and so on. Each stack consists of cells that “recognise” a distinct policy feature. So the financial regulation stack has cells that recognise state regulation by law, self-regulation by the industry, or free-market economics.

The overall political stance of any given political party is a choice of one cell from each stack – one policy decision on each issue. Each prospective voter has his or her preferences, and these might not match those of any particular party. If these choices are used as inputs to the network, it will identify the party that most closely fits what the voter prefers. That decision can then be passed to other areas of the brain. Some voters may find themselves in a state akin to a visual illusion, vacillating between Labour and Liberal Democrat, or Conservative and Ukip.

This idea is speculative and it is not intended to be a literal description of how we decide whom to vote for. It is a schematic outline of something more complex, involving many regions of the brain. However, it provides a simple and flexible model for decision-making by a neural network, and in particular it shows that simple networks can do the job quite well. Martin Golubitsky of the Mathematical Biosciences Institute at Ohio State University and Casey O Diekman of the University of Michigan wondered whether Wilson’s networks could be used to model more complex examples of rivalry and illusions. Crucially, the resulting models allow specific predictions about experiments that have not yet been performed, making the whole idea scientifically testable.

The first success of this approach helped to explain an experiment that had already been carried out, with puzzling results. When the brain reassembles the separate bits of an image, it is said to “bind” these pieces. Rivalry provides evidence that binding occurs, by making it go wrong. In a rivalry experiment carried out in 2006 by S W Hong and S K Shevell, the subject’s left eye is shown a horizontal grid of grey and pink lines while the right eye sees a vertical grid of grey and green lines. Many subjects perceive an alternation between the images, just as della Porta did with his books. But some see two different images alternating: pink and green vertical lines, and pink and green horizontal lines – images shown to neither eye. This effect is called colour misbinding; it tells us that the reassembly process has matched colour to grid direction incorrectly. It is as if della Porta had ended up seeing another book altogether.

Golubitsky and Diekman studied the simplest Wilson network corresponding to this experiment. It has two stacks: one for colour, one for grid direction. Each stack has two cells. In the “colour” stack one cell detects pink and the other green; in the “orientation” stack one cell detects vertical and the other horizontal. As usual, there are inhibitory connections within each stack to ensure a winner-takes-all decision.

Following Wilson’s general scheme, they also added excitatory connections between cells in distinct stacks, representing the combinations of colour and direction that occur in the two “learned” images – those actually presented to the two eyes. Then they used recent mathematical techniques to list the patterns that arise in such a network. They found two types of oscillatory pattern. One corresponds to alternation between the two learned images. The other corresponds precisely to alternation between the two images seen in colour misbinding.

Colour misbinding is therefore a natural feature of the dynamics of Wilson networks. Although the network is “set up” to detect the two learned images, its structure produces an unexpected side effect: two images that were not learned. The rivalry experiment reveals hints of the brain’s hidden wiring. The same techniques apply to many other experiments, including some that have not yet been performed. They lead to very specific predictions, including more circumstances in which subjects will observe patterns that were not presented to either eye.

Similar models also apply to illusions. However, the excitatory connections cannot be determined by the images shown to the two eyes, because both eyes see the same image. One suggestion is that the connections may be determined by what your visual system already “knows” about real objects.

Take the celebrated moving illusion called “the spinning dancer”. Some observers see the solid silhouette of a dancer spinning anticlockwise, others clockwise. Sometimes, the direction of spin seems to switch suddenly.

We know that the top half of a spinning dancer can spin either clockwise or anticlockwise. Ditto for the bottom half. In principle, if the top half spins one way but the bottom half spins the other way, you would see the same silhouette, as if both were moving together. When people are shown “the spinning dancer”, no one sees the halves moving independently. If the top half spins clockwise, so does the bottom half.

Why do our brains do this? We can model that information using a series of stacks that correspond to different parts of the dancer’s body. The brain’s prior knowledge sets up a set of excitatory connections between all cells that sense clockwise motion, and another set of excitatory connections between all “anticlockwise” cells. We can also add inhibitory connections between the “clockwise” and the “anticlockwise” cells. These connections collectively tell the network that all parts of the object being perceived must spin in the same direction at any instant. Our brains don’t allow for a “half and half” interpretation.

When we analyse this network mathematically, it turns out that the cells switch repeatedly between a state in which all clockwise cells are firing but the anticlockwise ones are quiescent, and a state in which all anticlockwise cells are firing but the clockwise ones are quiescent. The upshot is that we perceive the whole figure of the dancer switching directions. Similar networks provide sensible models for many other illusions, including some in which there are three different inputs.

These models provide a common framework for both rivalry and illusion, and they unify many experiments, explain otherwise puzzling results and make new predictions that can be tested. They also tell us that in principle the brain can carry out some apparently complex tasks using simple networks. (What it does in practice is probably different in detail, but could well follow the same general lines.)

This could help make sense of a real brain, as new experiments improve our ability to observe its “wiring diagram”. It might not be as ambitious as trying to model the whole thing on a computer, but modesty can be a virtue. Since simple networks behave in strange and unexpected ways, what incomprehensible quirks might a complicated network have?

Perhaps Dalí, and Escher, and the spinning dancer can help us find out. 

Ian Stewart is Emeritus Professor of Mathematics and Digital Media Fellow at the University of Warwick