How GPU MIP Chains Simulate Peripheral Vision

Peripheral vision is better modeled as spatial pooling than uniform blur. Resolution falls off steeply with eccentricity, and GPUs already encode this hierarchy: the MIP chain. Each level halves resolution, and textureLod() samples any level directly. Scrutinizer maps eccentricity to MIP level — sharp at center, pooled at edges — one texture lookup per pixel. The MIP grid is a foundational constraint of running on texture hardware; Scrutinizer layers biologically-grounded sector geometry on top, so the MIP chain handles how much resolution to lose while isotropic cortical sectors control where and in what shape pooling occurs.

The MIP chain

A MIP chain is a stack of progressively downsampled copies of the same image. Level 0 is full resolution; each next level halves width and height (so level 4 is 1/16 linear scale, 1/256 pixels).

Each level is a stronger low-pass version of the image: fine detail disappears first, coarse structure remains.

MIP levels 0–4 — click to compare

From MIPs to DoG bands

A Difference-of-Gaussians (DoG) band isolates a range of spatial frequencies by subtracting two blur scales. The MIP chain already gives us discrete blur scales, so we approximate DoG directly from adjacent levels:

bandk ≈ levelk − levelk+1

The bands are computed from the same MIP levels shown above — subtraction of adjacent levels isolates each frequency range.

DoG bands — spatial frequency decomposition

Scrutinizer drops high-frequency bands with eccentricity: near the fovea most bands are retained; farther out, only coarse bands remain. In the shader, one textureLod() call per pixel:

float mipLevel = log2(1.0 + r_deg / a);
vec4 color = textureLod(page, uv, mipLevel);

The logarithmic mapping comes from cortical magnification — Blauch, Alvarez & Konkle (2026) formalized it as M(r) = 1/(r + a). Each MIP level halves resolution, so the MIP level at eccentricity r is the log2 of the resolution ratio (r + a) / a. The constant a = 2.78° controls foveation strength.

For a typical laptop at ~50 cm viewing distance (~45 ppd), each MIP level doubles the area averaged into a single output pixel:

Eccentricity MIP level Cell size What survives
0° (fovea) 0.0 1×1 px Everything — full resolution
2.78° (a) 1.0 2×2 px Text readable, serifs gone
1.5 2.8×2.8 px Headings, icons as shapes
10° 2.2 4.6×4.6 px Color blocks, layout panels
15° 2.7 6.5×6.5 px Only large-scale structure

“Cell size” is 2mipLevel — the number of source pixels averaged into each output pixel at that eccentricity. At 10°, a 4.6×4.6 px cell means a 14 px icon is represented by ~3 pooled samples. You know something is there, but you can’t tell what it is. That’s peripheral vision.


Where MIPs diverge from biology

The MIP chain gets one thing right: resolution decreases with eccentricity. But it gets three things wrong.

1. Frequency rolloff is a step function, not a gradient. A single textureLod() call is a rectangular low-pass filter: everything above the cutoff is killed, everything below is fully preserved. Biological vision has a graded rolloff across spatial frequencies. Scrutinizer’s DoG decomposition partially corrects this by splitting the MIP chain into 12 half-octave bands, each attenuated independently via M-scaling cutoffs with smoothstep transition zones. The DoG path preserves 1.4× more CSF-weighted fidelity than pure MIP sampling, with 1.9× smoother spectral transitions.

2. Pooling cells are rectangles, not cortical sectors. textureLod() averages axis-aligned rectangular blocks. Cortical pooling regions tile radially around fixation. The MIP grid doesn’t know where the fovea is.

The comparison below shows both grids at the same eccentricity range. Left: rectangular MIP tiles grow in power-of-2 steps, axis-aligned regardless of fixation. Right: isotropic cortical sectors tile radially around fixation, with spoke count adapted per ring so every cell is roughly square.

3. A naive polar fix creates a new problem. Replacing the rectangular grid with a polar grid (concentric rings, fixed spoke count) orients cells around fixation — but outer cells stretch into arcs with 4:1 aspect ratios. The demo below shows all three approaches. Green cells are isotropic (aspect ratio near 1:1); red cells are stretched:


The corrections

Scrutinizer keeps textureLod() as the blur engine — that’s the performance trick that makes 60 fps possible. The corrections layer on top.

Frequency selectivity: The DoG band decomposition splits MIP levels into 12 independently addressable frequency bands. Each band gets its own eccentricity cutoff (M-scaling) and its own chromatic attenuation (RG and BY channels decay at different rates via castleCSF). This is what uniform MIP blur cannot do: preserve low-frequency layout structure while removing high-frequency detail, and desaturate red-green faster than blue-yellow.

Pooling geometry: FOVI’s cortical magnification function (Blauch, Alvarez & Konkle 2026) provides isotropic sector geometry. Ring boundaries at uniform cortical distance (w = log(r + a), a = 2.78°), spoke count per ring chosen for isotropy (n = floor(2πr / Δr)). Each sector’s extent parameterizes two downstream stages:

The MIP chain handles how much resolution to lose. The isotropic sectors handle where to pool and in what shape. The rectangular MIP grid is a foundational constraint of running on texture hardware — but by layering sector-parameterized effects on top, Scrutinizer produces output closer to cortical pooling than either MIP blur or a naive polar grid alone.

Level selection: The MIP chain is a low-pass filter with power-of-2 resolution steps. The GPU selects which step via the Jacobian — the matrix of screen-space derivatives of the texture coordinate (dFdx(uv), dFdy(uv)). When V1 displacement warps the sampling UV, the undistorted Jacobian becomes wrong: the hardware picks a MIP level based on a coordinate field that no longer matches the actual sampling pattern. The result is over-blurring radially (where the warp compresses UV) and under-blurring tangentially (where it stretches).

The fix: pass dFdx(v1.distortedUV) instead of dFdx(uv). Same filter, same power-of-2 steps — but now the level selection tracks the warp. The low-pass cutoff frequency matches the actual sampling density at each pixel rather than the undistorted density. The MIP chain’s tile shape is a hardware constraint we accept; the level selection is a software choice we can make correctly. This is the difference between working with the acceleration and working despite it.

Geometry validated with a 19-test suite: ring boundaries match FOVI’s Python reference to 3 decimal places, aspect ratios within 0.8–1.2 across all eccentricities.


Links: GitHub · v2.6 release post · Kitten image: Pixabay CC0