Depth Perception - Elia Torre

|![[ETH/ETH - Systems Neuroscience/Images - ETH Systems Neuroscience/image117.png]] |![[ETH/ETH - Systems Neuroscience/Images - ETH Systems Neuroscience/image118.png]] | |---|---| There are many cues that give a 2D image the sense of depth. For example, objects in the front cover objects behind them, or the perspective used by Leonardo Da Vinci in the famous "Ultima Cena". ### Cues to Depth Perception - **Monocular Depth Cues**: - **Size** - **Lighting** and **Shadows**: the image to the right shows six dots whose coloring is just rotated by 180 degrees, we perceive the left set as being a concave and the second set as being convex because we are biased to imagine the source of light as coming from above. - **Interposition** - **Clarity** and **Elevation** - **Perspective** - **Binocular Depth Cues**: - **Convergence**: if we are observing point A and we want to observe point B, we will have to perform a convergent movement of the eyes. Such movement allows the brain to understand that point B is closer to us that point A. Potentially, via the convergence angle we could estimate the absolute distance of an objects. However, our brain has not evolved to do that. - **Stereopsis** (**Binocular Disparity**): the fact that the two images that we have in the two retinas are not exactly the same allows the brain to estimate the depth of objects in the scene. Referring to the image, if we fix a point, that would be represented in the foveal point of the retina. However, farther points would be represented to the left of the fovea in the right eye and opposite for the other eye. These represent non-corresponding points on the two eyes, which are exploited by the brain to estimate depth. Contrarily to the possibility of evaluating the absolute distance of an object thanks to the convergence of eyes, binocular disparity allows the evaluation of relative distance of an object from the fixation point. |![[ETH/ETH - Systems Neuroscience/Images - ETH Systems Neuroscience/image119.png]] |![[ETH/ETH - Computational Vision/Images - ETH Computational Vision/image120.png]] |![[ETH/ETH - Computational Vision/Images - ETH Computational Vision/image121.png]] |![[ETH/ETH - Computational Vision/Images - ETH Computational Vision/image122.png]] | |---|---|---|---| ### The Horopter The horopter refers to the set of points in space that are perceived as being at the same depth or distance as the fixation point when both eyes are focused on that point. The horopter is an important concept in binocular vision and stereopsis, as it helps explain how the brain uses the disparity in the images captured by the two eyes to perceive depth and create a 3D representation of the environment. When both eyes are focused on a single point in space, the visual system aligns the corresponding retinal points of the two eyes. Points on the horopter fall on these corresponding retinal points in both eyes, meaning they have zero disparity and are perceived as being at the same depth as the fixation point. The horopter is not a fixed geometric shape but changes depending on the position of the fixation point and the angle of convergence of the eyes. ![[ETH/ETH - Computational Vision/Images - ETH Computational Vision/image123.png|500]] **The Horopter and Panum's Fusion Area** ![[ETH/ETH - Computational Vision/Images - ETH Computational Vision/image124.png|500]] When both eyes are focused on a point in space, the visual system aligns the corresponding retinal points of the two eyes. Points on the horopter fall on these corresponding retinal points and are perceived as being at the same depth as the fixation point. However, the visual system can tolerate small disparities and still fuse the images from the two eyes into a single perception of depth. This tolerance is described by Panum's fusion area, which is the region around the horopter where the disparities between the images are small enough for the brain to combine the images and perceive them as a single 3D image. Outside of Panum's fusion area, the disparities between the images from the two eyes become too large for the visual system to fuse them effectively, leading to a phenomenon called diplopia, or double vision. This occurs because the visual system cannot reconcile the differences between the two images and perceive them as a single, coherent 3D image. Panum's fusion area is important in understanding how the brain processes depth information from the two eyes and forms a coherent, single perception of the environment. It highlights the visual system's ability to tolerate and fuse small disparities, which is crucial for achieving binocular depth perception and stereopsis. ### The Wheatstone Stereoscope ![[ETH/ETH - Computational Vision/Images - ETH Computational Vision/image125.png|500]] It is an optical instrument that allows to demonstrate the perception of depth in binocular vision. It uses two separate images, each representing the view of a scene from the perspective of one eye, to create the illusion of a single, 3D image when viewed with both eyes. - Two images are created by capturing or drawing the same scene from two slightly different viewpoints, mimicking the horizontal separation of the left and right eyes. These images, known as stereopairs or stereograms, contain slightly different information, just as the images captured by our eyes do. - The stereoscope consists of a frame with two mirrors placed at a 90-degree angle to each other, with one mirror in front of each eye. The stereopair images are positioned on either side of the mirrors, facing them. - When the viewer looks into the mirrors, each eye sees the reflection of one of the images from the stereopair. The mirrors ensure that the images are aligned correctly, with the left eye viewing the left image and the right eye viewing the right image. - The brain processes the images from each eye separately and then combines them into a single, coherent image. Due to the differencesbetween the images, the brain perceives depth in the scene, creating the illusion of a 3D image. **The Correspondence Problem in Stereopsis** ![[ETH/ETH - Computational Vision/Images - ETH Computational Vision/image126.png|500]] Referring to the figure above, it is difficult to establish the relative position of the objects on the two retinas. In order to be sure that a certain point on the left retina matches another specific point on the right retina, we have to assume a distribution of the points. What the image shows at the bottom are all valid distributions for the 4 points based on their representation on the retina. In order to do so, the brain exploits a heuristic approach. The correspondence problem arises because the visual system must search for matching points or features in two different images that contain many similar or repetitive elements. This can be particularly challenging in complex or textured scenes, where many elements might appear similar and finding the correct correspondence becomes more difficult. ### Neurons Tuned for Binocular Disparity (monkey V1) The graphs below show individual V1 cells responses measured in monkeys during individual stimulation of both eyes varying the amount of binocular disparity. They labelled positive disparity for objects farther away from the fixation point and negative disparity for objects closer than the fixation point. They find out that there is a number of different cells profiles based on how they respond to different disparity stimuli. For example, the graph on top-right corner shows a cell that strongly response to a positive disparity in the stimulus (i.e., a cell that encodes objects far away from the fixation point), while the bottom-left figure shows a cell that behaves oppositely, i.e., increases firing in the presence of negative disparity. The graph on top-right shows a cell that is perfectly tuned to a specific amount of binocular disparity and strongly responds in such case. The bottom-left graph shows a cell that is again tuned to a specific amount of binocular disparity, but shows an inhibitory effect rather than excitatory. How are these cells in the cortex tuned for non-zero disparity? Two models have been proposed: - **Position Shift Model**: according to this model, the brain features some "disparity-tuned neurons" that are shifted in position such that when the stimulus with binocular disparity arrives to V1 it maximally excites both neurons**.** - **Phase Shift Model**: according to this model, the receptive fields of the neurons match in terms of position but differ in terms of internal organization such that they are maximally excited when the stimulus has a binocular disparity. |![[ETH/ETH - Computational Vision/Images - ETH Computational Vision/image127.png]] |![[ETH/ETH - Computational Vision/Images - ETH Computational Vision/image128.png]] | |---|---| **Binocular Receptive Fields of Disparity Tuned Neurons** ![[ETH/ETH - Computational Vision/Images - ETH Computational Vision/image129.png]] In the image above, the receptive fields of disparity tuned neurons have been mapped in a cat cortex. It has been observed that both phase shift and position shift models seem to occur and also a mix between them. The first column of images shows a phase shift of the receptive fields between left and right eye. While in the second column we have both a phase shift and a position shift of the receptive fields between the left and the right eye. (white region = on-region, black region = off-region).