Measuring the Pipeline

v2.1.0 release notes →

v2.0 made the pipeline visible. v2.1 asks whether it’s correct. We rebuilt five classic human vision experiments as automated reference pages, ran the shader against each one, and compared pixel measurements to the published data. The whole battery — stimulus pages, capture scripts, analysis, golden captures — ran in a single day. Three shader bugs fell out that prior testing had not caught.

Also in this release: 8 half-octave DoG bands replace the old 4-band decomposition. Smoother blur gradient, twice as many frequency transitions, same API.


Five waves of validation

Each wave targets a different stage of the rendering pipeline and emulates a different class of psychophysical experiment. The test pattern is consistent: render a known stimulus through the full shader, measure output pixels, compare against published human data.

Four experimental stimuli used for psychophysical validation: color search singletons, spatial frequency gratings, flanked crowding letters, and saliency pop-out targets
The four stimulus classes. Each recreates a published psychophysical experiment as an HTML reference page. Top-left: chromatic singletons (Hansen 2009). Top-right: sine-wave gratings at 0.25–4 cpd (Rovamo & Virsu 1979). Bottom-left: flanked letter identification (Bouma 1970). Bottom-right: color/luminance pop-out with face (Itti & Koch 2001).
The same four stimuli rendered through Scrutinizer's peripheral vision pipeline, showing eccentricity-dependent degradation
Pipeline output. The same stimuli rendered through the full shader at center fixation. High-frequency content (fine gratings, letter serifs) degrades first. Chromatic content desaturates with eccentricity. Salient elements (face, color singleton) receive partial protection. Pixel measurements at each eccentricity ring are compared against the published human data.
Wave Domain Published basis Key finding
1 Chromatic decay Mullen 2002, Hansen 2009, Bowers 2025 Green tracks the RG curve, not BY — hue-based models get this wrong
2 Spatial frequency Rovamo & Virsu 1979 DoG produces step functions at MIP boundaries vs smooth CSF
3 Crowding geometry Bouma 1970, Toet & Levi 1992 R:T bug found and fixed; density gate validated at 3.3:1 ratio
4 Saliency protection Itti & Koch 2001, Hershler 2005 Face saliency 4.79× control; protection ratio 0.283
5 Mixed-density search Halverson & Hornof 2011 Density gate predicts same sparse/dense degradation pattern as EPIC model

The validation infrastructure — 5 capture scripts, 5 analysis scripts, 15 reference pages, 25+ golden captures — now ships with the repository. Every claim in the arxiv paper can be reproduced by running the scripts.

The published data we validated against spans 45 years of vision science, digitized into machine-readable JSON: tests/validation/published-data/

What the validation found wrong

Three bugs surfaced through measurement that visual inspection missed:

Two limits we found


8 half-octave DoG bands

The previous version decomposed the image into 4 frequency bands — fine detail, medium detail, coarse structure, layout. The blur transition from sharp to pooled had 4 steps, which meant the parafoveal region (where text goes from readable to unreadable) had only 1–2 active bands at any point. Coarse.

v2.1 doubles the resolution to 8 bands at half-octave spacing. The blur gradient is now twice as smooth, with the biggest improvement in the critical parafoveal zone. The old 4 bands are still in there (bands 1, 3, 5, 7 below) — the new bands interleave between them.

Band Freq (cpd) Content New?
05.66Serifs, fine detailNew
14.0Thin strokes= old band 0
22.83Letter bodiesNew
32.0Small icons= old band 1
41.41Words, UI labelsNew
51.0Word groups= old band 2
60.71Buttons, panelsNew
70.5Layout blocks= old band 3

Performance cost is negligible — under 1ms at 1080p.


Wave 5: Halverson mixed-density

The fifth validation wave tests the density gate against behavioral data from a real UI search task. Halverson & Hornof (2011) built an EPIC cognitive architecture model and found that peripheral encoding accuracy depends on local text density: 90% for sparse text (nearest neighbor ≥ 0.15°), 50% for dense text (< 0.15°).

The stimulus page reproduces their mixed-density search task: 6 word groups in a 2×3 grid, sparse and dense conditions at parametric spacing. The analysis script measures per-group SSIM, edge density ratio, and composite availability — testing whether Scrutinizer’s density gate predicts the same pattern that Halverson validated against 24 participants’ eye-tracking data.


Also in this release


What’s next

The validation pipeline is infrastructure. Now that every change can be measured against published data, the path forward is clearer. Next: orientation-dependent blur (horizontal and vertical edges persist further than diagonal ones in real peripheral vision) and smoother color integration across the visual field.

The density gate holds up under measurement but is still an approximation of the real crowding mechanism. Whether GPU rendering can get closer is an open question.

References cited: Bouma 1970 · Bowers, Gegenfurtner & Goettker 2025 · Halverson & Hornof 2011 · Hansen et al. 2009 · Hershler & Hochstein 2005 · Itti & Koch 2001 · Mullen & Kingdom 2002 · Rovamo & Virsu 1979 · Toet & Levi 1992 — Annotated Bibliography · Primer References