2026-03 Tech Brief
How GPU MIP Chains Simulate Peripheral Vision
Peripheral vision is better modeled as spatial pooling than uniform blur.
Resolution falls off steeply with eccentricity, and GPUs already encode this
hierarchy: the MIP chain. Each level halves resolution, and
textureLod() samples any level directly. Scrutinizer maps
eccentricity to MIP level — sharp at center, pooled at edges —
one texture lookup per pixel. The MIP grid is a foundational constraint of
running on texture hardware; Scrutinizer layers biologically-grounded sector
geometry on top, so the MIP chain handles how much resolution to
lose while isotropic cortical sectors control where and in what
shape pooling occurs.
The MIP chain
A MIP chain is a stack of progressively downsampled copies of the same image. Level 0 is full resolution; each next level halves width and height (so level 4 is 1/16 linear scale, 1/256 pixels).
Each level is a stronger low-pass version of the image: fine detail disappears first, coarse structure remains.
From MIPs to DoG bands
A Difference-of-Gaussians (DoG) band isolates a range of spatial frequencies by subtracting two blur scales. The MIP chain already gives us discrete blur scales, so we approximate DoG directly from adjacent levels:
bandk ≈ levelk − levelk+1
levelk= content up to scale k (low-pass)levelk+1= even lower-pass version- subtracting them leaves the octave of detail between those scales (band-pass)
The bands are computed from the same MIP levels shown above — subtraction of adjacent levels isolates each frequency range.
Scrutinizer drops high-frequency bands with eccentricity: near the fovea
most bands are retained; farther out, only coarse bands remain. In the shader,
one textureLod() call per pixel:
float mipLevel = log2(1.0 + r_deg / a);
vec4 color = textureLod(page, uv, mipLevel);
The logarithmic mapping comes from cortical magnification —
Blauch, Alvarez & Konkle (2026)
formalized it as M(r) = 1/(r + a). Each MIP level halves resolution,
so the MIP level at eccentricity r is the log2 of the
resolution ratio (r + a) / a. The constant a = 2.78°
controls foveation strength.
For a typical laptop at ~50 cm viewing distance (~45 ppd), each MIP level doubles the area averaged into a single output pixel:
| Eccentricity | MIP level | Cell size | What survives |
|---|---|---|---|
| 0° (fovea) | 0.0 | 1×1 px | Everything — full resolution |
2.78° (a) |
1.0 | 2×2 px | Text readable, serifs gone |
| 5° | 1.5 | 2.8×2.8 px | Headings, icons as shapes |
| 10° | 2.2 | 4.6×4.6 px | Color blocks, layout panels |
| 15° | 2.7 | 6.5×6.5 px | Only large-scale structure |
“Cell size” is 2mipLevel — the number of source pixels averaged into each output pixel at that eccentricity. At 10°, a 4.6×4.6 px cell means a 14 px icon is represented by ~3 pooled samples. You know something is there, but you can’t tell what it is. That’s peripheral vision.
Where MIPs diverge from biology
The MIP chain gets one thing right: resolution decreases with eccentricity. But it gets three things wrong.
1. Frequency rolloff is a step function, not a gradient.
A single textureLod() call is a rectangular low-pass filter:
everything above the cutoff is killed, everything below is fully preserved.
Biological vision has a graded rolloff across spatial frequencies. Scrutinizer’s
DoG decomposition partially corrects this by splitting the MIP chain into 12
half-octave bands, each attenuated independently via M-scaling cutoffs with
smoothstep transition zones. The DoG path preserves 1.4× more
CSF-weighted fidelity than pure MIP sampling, with 1.9× smoother spectral
transitions.
2. Pooling cells are rectangles, not cortical sectors.
textureLod() averages axis-aligned rectangular blocks. Cortical
pooling regions tile radially around fixation. The MIP grid doesn’t know
where the fovea is.
The comparison below shows both grids at the same eccentricity range. Left: rectangular MIP tiles grow in power-of-2 steps, axis-aligned regardless of fixation. Right: isotropic cortical sectors tile radially around fixation, with spoke count adapted per ring so every cell is roughly square.
3. A naive polar fix creates a new problem. Replacing the rectangular grid with a polar grid (concentric rings, fixed spoke count) orients cells around fixation — but outer cells stretch into arcs with 4:1 aspect ratios. The demo below shows all three approaches. Green cells are isotropic (aspect ratio near 1:1); red cells are stretched:
The corrections
Scrutinizer keeps textureLod() as the blur engine —
that’s the performance trick that makes 60 fps possible. The
corrections layer on top.
Frequency selectivity: The DoG band decomposition splits MIP levels into 12 independently addressable frequency bands. Each band gets its own eccentricity cutoff (M-scaling) and its own chromatic attenuation (RG and BY channels decay at different rates via castleCSF). This is what uniform MIP blur cannot do: preserve low-frequency layout structure while removing high-frequency detail, and desaturate red-green faster than blue-yellow.
Pooling geometry:
FOVI’s cortical magnification
function (Blauch, Alvarez & Konkle 2026) provides isotropic sector
geometry. Ring boundaries at uniform cortical distance
(w = log(r + a), a = 2.78°), spoke count per ring
chosen for isotropy (n = floor(2πr / Δr)). Each sector’s
extent parameterizes two downstream stages:
- Crowding distortion — noise frequency scales with sector width, so distortion wavelength matches cortical pooling size
- Texture synthesis — the WebGPU compute path pools luminance statistics over biologically-scaled regions instead of a fixed grid
The MIP chain handles how much resolution to lose. The isotropic sectors handle where to pool and in what shape. The rectangular MIP grid is a foundational constraint of running on texture hardware — but by layering sector-parameterized effects on top, Scrutinizer produces output closer to cortical pooling than either MIP blur or a naive polar grid alone.
Level selection: The MIP chain is a low-pass filter with
power-of-2 resolution steps. The GPU selects which step via the Jacobian —
the matrix of screen-space derivatives of the texture coordinate
(dFdx(uv), dFdy(uv)). When V1 displacement warps the
sampling UV, the undistorted Jacobian becomes wrong: the hardware picks a MIP level
based on a coordinate field that no longer matches the actual sampling pattern. The
result is over-blurring radially (where the warp compresses UV) and under-blurring
tangentially (where it stretches).
The fix: pass dFdx(v1.distortedUV) instead of dFdx(uv).
Same filter, same power-of-2 steps — but now the level selection tracks the
warp. The low-pass cutoff frequency matches the actual sampling density at each
pixel rather than the undistorted density. The MIP chain’s tile shape is a
hardware constraint we accept; the level selection is a software choice we can
make correctly. This is the difference between working with the
acceleration and working despite it.
Geometry validated with a 19-test suite: ring boundaries match FOVI’s Python reference to 3 decimal places, aspect ratios within 0.8–1.2 across all eccentricities.
Links: GitHub · v2.6 release post · Kitten image: Pixabay CC0