Summary: Researchers have uncovered how primate brains transform flat, two-dimensional visual inputs into rich, three-dimensional mental representations. This process, known as inverse graphics, involves the inferotemporal cortex reconstructing 3D models from 2D images by simulating depth, shape, and orientation. The process, called “reverse graphics,” works by reversing the principles of computer graphics: from a two-dimensional view, through an intermediate step, to a three-dimensional model.
Using a neural network called a body inference network, the scientists were able to map this process and show that it closely resembles the activity of primate brain regions responsible for recognizing physical shapes. The findings shed light on human depth perception and could lead to advances in AI and treatments for vision impairments.
Important facts:
- The primate infratemporal cortex creates three-dimensional mental models from 2D images through a process of “inverse graphics.”
- The researchers used a neural network to simulate this process and mapped it to the brain activity of the macaque.
- This work could shed light on the design of artificial vision and contribute to the understanding of visual perception disorders.
Source: Yale
Yale University researchers have discovered a process in the primate brain that provides new insights into how the visual system works and could lead to advances in human neuroscience and artificial intelligence.
Using a new computer model, researchers have discovered an algorithm that shows how the primate brain creates an internal three-dimensional (3D) representation of an object when viewing a two-dimensional (2D) image of it.
“This provides us with evidence that the purpose of vision is to promote a three-dimensional understanding of an object,” said the study’s lead author, Alker Yildirim, an assistant professor of psychology at the Yale School of Arts and Sciences.
“When you open your eyes, you see 3D scenes: the brain’s visual system is capable of creating a 3D understanding from a simple 2D scene.”

The researchers coined the term “reverse graphics” to describe a remarkable process in the brain’s visual system. Much like a computer graphics pipeline—but in reverse—the brain doesn’t render 2D images from 3D models. Instead, it begins with flat, two-dimensional visual input and reconstructs a full three-dimensional understanding of objects.
This transformation highlights the brain’s ability to infer depth, shape, and spatial orientation from limited visual cues, revealing a sophisticated internal mechanism for building mental models of the world. It’s a striking example of how biological systems can outperform even the most advanced artificial vision technologies.
This transformation moves through a “2.5D” intermediate stage, which encodes partial depth and surface cues, allowing the brain to build a more flexible, view-tolerant 3D representation. These findings were published in the Proceedings of the National Academy of Sciences, offering new insights into how perception bridges the gap between vision and cognition.
The human brain essentially converts the two-dimensional images we see—on paper or on a screen—into three-dimensional mental models. Computer graphics, on the other hand, does the opposite: it converts three-dimensional scenes into two-dimensional images.
“This is a significant advance in understanding computer vision,” said Yildirim. “The brain performs this complex task automatically, yet it demands immense computational power.”
He added that replicating this level of performance in artificial systems remains a major challenge. “It’s still difficult to build computer vision models that can handle everyday scenes with the same efficiency and flexibility as the human brain.”
The researchers say the discovery could advance research in human neuroscience and vision disorders and promote the development of artificial vision systems with primate visual capabilities.
In their study, researchers identified a key function within the primate brain’s temporal lobe—specifically the inferotemporal cortex, a region known for its role in visual recognition. This area is responsible for transforming flat, two-dimensional images into rich, three-dimensional mental models of objects.
This discovery sheds light on how the brain achieves complex visual understanding, enabling primates to perceive depth, shape, and orientation from limited visual input. It marks a major step forward in decoding the neural mechanisms behind 3D object recognition.
They achieved this using the so-called Body Inference Network (BIN), a neural network-based model capable of creating a 2D representation of an object based on features of shape, pose, and orientation.
In this case, however, the researchers trained BIN to reverse the process and create three-dimensional human and monkey bodies directly from the images (labeled with 3D data). Using this data, BIN was able to reverse the normal computer graphics process and thus extract 3D features from 2D images.
By comparing data from the Body Inference Network (BIN) with neural recordings from macaques viewing images of macaque bodies, researchers uncovered a striking parallel. The stages of BIN’s computational processing closely mirrored patterns of brain activity observed in the primates.
Specifically, two regions in the macaque brain—the middle superior temporal sulcus body patch (MSB) and the anterior superior temporal sulcus body patch (ASB)—exhibited neural activity that closely matched the stages of processing in the Body Inference Network (BIN). This alignment suggests a strong correspondence between artificial inference models and biological visual pathways.
These brain regions are critically involved in recognizing and interpreting body shapes, supporting the idea that BIN captures essential aspects of how primates perceive and mentally reconstruct visual forms. The findings offer compelling evidence that computational models can mirror the brain’s natural strategies for understanding complex visual stimuli.
“Our model describes visual processing in the brain more accurately than other AI models,” Yıldırım said.
“We are particularly interested in the neuroscientific and cognitive aspects of this topic, but we also hope that it can inspire new artificial vision systems and facilitate potential medical interventions in the future.”
Other authors of the study include first author Hakan Yilmaz and Alap Shah, both doctoral students at Yale University’s Graduate School of Arts and Sciences, as well as researchers at Princeton University and KU Leuven in Belgium.
Abstract
Multi-area processing in body regions of primate inferotemporal cortex implements inverse graphs.
Stimulus-driven processing in multiple areas of the inferotemporal cortex (IT) is thought to be essential for converting sensory input into useful representations of the world.
What formats do these neural representations take and how are they computed in nodes of computer networks?
A growing literature in computational neuroscience focuses on the computational goal of obtaining high-level image statistics that support useful distinctions, for example, between object recognition or categories.
Here, inspired by classical vision theories, we propose a different approach. We show that 3D object estimation can be a computational goal in its own right, implemented by algorithms that conform to graphical generative models for generating and rendering images of 3D scenes, but vice versa.
Using body perception as a case study, the researchers demonstrated that inverse graphics—the process of reconstructing 3D object representations from 2D images—can emerge naturally within inference networks. These networks were trained to associate visual inputs with corresponding three-dimensional structures.
Remarkably, the networks developed this capability without explicit programming, suggesting that the brain’s own visual system may rely on similar emergent principles. This finding strengthens the idea that reverse graphics is not just a theoretical model, but a biologically plausible mechanism for visual understanding.
Remarkably, this correspondence with the inverse of a graph-based generative model also holds for the body processing network of the computational cortex of macaque monkeys.
Finally, inference networks summarize the progress of this computational network through stages and outperform current dominant vision models, including both supervised and unsupervised varieties, none of which are amenable to graph inversion.
This work proposes the inverse graph as a computationally implemented multi-domain neural algorithm and demonstrates the potential for replicating primate visual abilities in machines.

