A deterministic instrument for reading multi-object / multi-mass composition.
Russell Parrish · Parallax Metrology · ORCID 0009-0008-9781-7995
Stage 3 (Art) of the Parallax research ladder. Last updated 2026-06-25.
STATUS: core measure falsified on clean controls — diagnostic screen in progress
One line. AI struggles to render multi-object/mass work because it is challenging to read it.
This instrument is the reader. The target it was built for — Picasso's Family of
Saltimbanques — "never lets you rest": a field of competing centers that never
resolves. The task is to measure that property (Arnheim's eccentric pole) and
separate it from a settled distribution (Mondrian) and a single anchor (a portrait).
The substrate (the governing idea). Represent every region by the
INDEPENDENT EVIDENCE required to justify its existence. The purpose is not to recognise structure —
it is to justify structure. Three principles, and a hard gate:
EVIDENCE — what independently supports a structure?
PERSISTENCE — what survives changing assumptions or perturbations?
DEPENDENCY — what other structures rely upon it?
Acceptance test (in force): any new quantity must answer a structural question that Evidence,
Persistence, or Dependency cannot already answer — else it is not added. The canonical representation
is the belief vector (kept un-collapsed); support maps are visualisation, not the representation.
Language is binding: measurements, not interpretation ("EDI measures evidence-composition
diversity" — NOT "instability/restlessness", which are untested hypotheses). Full audit: AUDIT.md.
1. What the instrument is
A fixed, training-free measurement frame applied to any image; interpretation is
per-domain (the program's frame/memory/deviation discipline). Built by forking the
proven parallax-portrait engine (the reconciled consolidation of the VTL
color v0.4.8 spine + pathology radial/Ω + Canon3 + the-crit DGI + multitracker +
gauntlet) and adding the layers below.
Layer
Module
Reads
Status
Full kernel (the G's): L1/L3/L4 VCLI-G g1–g4/L6/L8 GLCM/L9 blobs/spectral/edge
Heartbeat: persistence under resolution (the squint)
persistence.py
does plurality survive coarsening → centric/eccentric
falsified (= chiaroscuro)
Convergence (do masses point at a shared fulcrum)
convergence.py
vector fulcrum, dispersion
exploratory (facing-proxy weak)
Scanpath (saliency + IOR — "lets you rest")
scanpath.py
fixation dispersion, competitive foci
texture-sensitive
Generalist + autonomous reading
read_full.py/read.py
one call → verdict + coords
runs
Visualisation (2-row artifact)
viz.py
see what it sees
works
2. The trail (backwards) — what we did and learned
2026-06-25 · build
Forked the portrait engine; re-added the canonical L6 resistance graph (the
edge-corridor / void machinery the portrait engine dropped). Wrote the heartbeat
(persistence = fraction of squint-scales at which the field still has ≥2 mass
poles; centricity = 1−persistence = Arnheim axis).
synthetic ground truth HELD: centric blob (persistence 0.33) vs eccentric 5-mass ring
(0.75). But the test killed three things before any real run: gradient pole-finding
fragments a textured mass into ~12 poles (→ switched the mass field to Notan:
low-pass of luminance, texture-robust); convergence and restlessness
failed to discriminate (texture-noise) → removed from the verdict gate.
verification & hardening
Fixed: grayscale/RGBA crash; silent uint8 mis-scaling (wrong color, no crash);
added low_resolution flag; integrated the full 50-coord kernel (the G's) so it
is a true generalist; fused a chroma channel into the mass field (Boafo: reads
equiluminant colour masses luminance is blind to — verified 0→7 poles on a flat-luminance
chromatic test).
calibration run #1 (CONTAMINATED)
Pulled the trio + Saltimbanques from Wikimedia. Reported Saltimbanques persistence
0.833 (highest) and "the sharp Mondrian test passed." RETRACTED.
The Saltimbanques image was a gallery photograph (a viewer's back, frame, wall,
adjacent works) — not the painting. Caught by looking at the file. Integrity rule since:
read every image before trusting a number.
clean re-run + image upgrade
Replaced with clean flat reproductions (curated, color-accurate, cropped — Caravaggio
re-pulled at 46 MP true-color; Mondrian cropped to canvas). Clean Saltimbanques persistence
fell to 0.667 — ties Caravaggio, +0.08 over Mondrian. The contamination had inflated
it. The single mass-persistence axis no longer isolates Saltimbanques.
the Rembrandt control — FALSIFICATION
Added the simplest possible centric control: a single-face self-portrait on a plain dark
ground. Prediction: hard CENTRIC, persistence ≈ 0. RESULT: persistence
0.75, ECCENTRIC — higher than Saltimbanques. A single face beats a six-figure scene.
Diagnosis (confirmed by the viz): the mass field |blur(L)−mean| counts
both light and dark deviations, so a chiaroscuro figure = many tonal masses (light
face + dark hat + dark coat). Persistence measures chiaroscuro / tonal busyness, not
composition. A second measure fell with it: bridge_cost is highest on
Mondrian (a hard grid maximises inter-island corridor resistance) → it tracks hard
partitioning, not Ma.
flow-field probe — the directional device negative
Russell's reframe: the brilliance of Saltimbanques is the direction of the figures and their
gaze — vectors that fight and never seat (vs Caravaggio/Matisse, which have a resting star).
Built a structure-tensor flow field + least-squares "resolving star" (flow.py).
Result: it reads brushwork/form GRAIN (vertical-dominant), not gaze/gesture.resolves is flat 0.66–0.70 (near-parallel grain lines → degenerate star fit);
fight_index is art-backwards (Caravaggio highest). Confirmed: the directional device
is a SEMANTIC vector — a figure looking left makes no leftward intensity signal; gaze/gesture
live in what the forms depict, not in edge orientation. Reading them genuinely needs face/pose/gaze
detection = crosses the no-semantic contract. Silver lining: the flow field accidentally drew
the figures as a row of vertical strokes at intervals — that is the rhythm skeleton, and unlike
gaze it IS reachable (spacing/periodicity of masses). "How it holds" via direction = semantic,
out of reach; via rhythm = structural, in reach. Artifacts: outputs/flow_*.png.
Flow field on Rembrandt — the structure tensor reads near-parallel vertical grain (brushwork / form), not gaze; the "resolving star" gets flung above the head because parallel lines have no real intersection. Grain, not the directional device.
rhythm / echo probe — controlled dashes informative negative
Russell's reframe: rhythm is echo following contrast — big shapes first, then dashes tracing
their contrast to find the smaller forms echoing the larger (arms echoing the torso). Built a
CONTROLLED dasher (rhythm.py): coarse structure tensor (big shapes only) + a contrast
gate. The control works mechanically but lands on the wrong things — on
Rembrandt, Caravaggio AND Picasso the dashes sat on the frame / vignette / background gradient,
not the figures, because raw coarse contrast is strongest at the periphery, not the subject. Of the
numbers, only rhyme (orientation concentration) discriminated (Rembrandt 0.72 vertical vs
Caravaggio 0.37 scattered); echo saturated, beat confounded by mass width.
Artifacts: outputs/rhythm_*.png.
Controlled dashes on Caravaggio — even gated to "major contrast," they land on the window / wall / frame, not the dark figure group (which is low-contrast, dark-on-dark). Contrast-gating ≠ the perceptual big shapes.
THE UNIFYING CONCLUSION — big shapes first
Every probe to date — gradient poles, tonal mass, persistence, the flow field, contrast-dashes —
latches onto texture / grain / frame / vignette instead of the figures, because we keep skipping
step 1. Russell's own process is the correct algorithm: (1) find the big shapes (figure-ground);
(2) dash their contours for echo/rhythm; (3) follow the contrast of those shapes. We kept attempting
2 and 3 without 1. "Big shapes first" is a figure-ground segmentation, not a contrast threshold, and
it is the prerequisite for everything — rhythm, echo, vectors, centricity all sit on top of it.
This is the channel-union figure-ground segmenter (§4b): big shapes = edge-bounded, void-separated,
colour-subdivided regions. confirmed next build: grouping.py
grouping.py v1 (colour/tone-led) fails on chiaroscuro
Built the big-shape segmenter as felzenszwalb on CIELab (colour+tone similarity). Flat/colour
works segment well (Picasso 3 figure-shapes/26%; Matisse 7 zones/35%). Chiaroscuro
works fail: Rembrandt 96% "ground" + a 3.7% shard (only the lit face); Caravaggio 98% ground.
A dark figure on a dark ground IS the same tone — region-similarity cannot separate them. The exact
wall again. Artifacts: outputs/grouping_*.png.
grouping v1 (colour/tone-led) on Rembrandt — only the lit face survives as a figure (3.7%); the hat, body and ground all merged into one 96% "ground". The dark figure dissolves into the dark ground.
the fix (was already in hand from §4b)
The channel diagnostic already showed the low-Canny silhouette holds the chiaroscuro figure whole
(outline captured even where interior is dark). So the segmenter must be EDGE-LED, not colour-led:
(1) silhouette = low-Canny + confirmed edges → morphological close + fill → figure region; (2) ground =
border-connected remainder; (3) colour subdivides the figure regions (Picasso clump → sub-figures).
Edge defines, colour subdivides — v1 inverted the order. next: grouping.py v2 (edge-led)
grouping.py v2 (edge-led) — STEP 1 STANDS breakthrough
Rebuilt edge-led: low-Canny + chroma edges → dilate → morphological close → fill holes → open.
The chiaroscuro wall is broken. Rembrandt: the whole bust as one figure (33%, was 3.7% in v1);
Caravaggio: the figure assembly (27%, was 2%). Picasso: cluster + the seated woman as two groups
across the ground-void — the Saltimbanques structure, on the actual figures. For the
centric/eccentric question the group-level read already works: Rembrandt 1 group (centric),
Picasso 2 groups + void (eccentric) — the centricity signal (n figure-groups + void-mediation) finally
computed on figures, not texture/grain/frame. Limits: over-merges WITHIN a group (Picasso clump = 1
blob; Matisse zones fused) → needs the colour-subdivision step (v1's strength); Rembrandt's dark hat
partly lost (low-contrast silhouette, tunable). Artifacts: outputs/grouping2_*.png.
v2 (edge-led) — the whole bust returns as one figure-shape (33%, was 3.7%). The chiaroscuro wall is broken.v2 — the left cluster and the seated woman as two groups across the ground-void: the Saltimbanques structure, on the actual figures.
Next: v3 = edge-led groups (v2) + colour subdivide (v1) — the channel union — then derive the
centricity coordinate and validate on the controls.
centricity coordinate v3 — FIRST VALIDATED STRUCTURAL COORDINATE milestone compute_centricity on edge-led groups + colour subdivision, components reported (n_groups,
anchor_dominance, dispersion, void) — not pre-fused. Passes the synthetic gate: centric(1) →
1 group / dispersion 0.0; scattered(5) → 6 / 0.45 (and subdivision does NOT over-split the monochrome
blob). Orders the paintings sensibly by dispersion: Rembrandt 0.19 (centric) · Caravaggio 0.24 ·
Picasso 0.34 (eccentric) · Mondrian 0.37 · Matisse 0.49 (distributed); anchor_dominance inversely
(Rembrandt 0.75 → Matisse 0.14). The first coordinate to pass ground truth AND order the paintings
sensibly — after persistence/bridge/mass_conc/anchor/convergence/flow/rhythm all confounded.
Residual confound (honest): subdivision is colour-based, so it couples with colourfulness
(disegno↔colorito) — Matisse reads distributed partly because it is most colourful; Caravaggio
under-splits (tonally similar figures → reads too compact). dispersion (spatial spread) is more robust
than n_groups (count). Fix: subdivide by colour OR internal edge (split tonally-similar-but-edge-
separated figures), + a confound control (monochrome-distributed vs colourful-centric). Code:
grouping.py compute_centricity(subdivide=True).
centricity hardened → CONFOUND-CONTROLLED first validated coordinate
Resolved the colourfulness confound by moving the validated coordinate to the GROUP level (edge-led
figure-groups, no colour subdivision) and adding a clean void_between_groups (ground fraction
in the corridor between the two largest groups). Confound-control battery PASSES all four: a
colourful centric figure stays centric (colour can't fake eccentricity); a monochrome eccentric
scene reads as 5 groups (no colour needed); low- and high-contrast centric both stay centric
(contrast-invariant); centric vs eccentric separate. eccentric_mono ≈ eccentric_colour (5 / disp 0.47) —
colour changes nothing. The project's first coordinate to pass ground truth AND isolate its factor from
colour and contrast. On the paintings it lands exactly on the Saltimbanques mechanism: Picasso is the ONLY work with
void-separated groups — void_between_groups = 0.21 for Saltimbanques, 0.0 for Rembrandt /
Caravaggio / Mondrian / Matisse. The LOAD-BEARING VOID between the seated woman and the cluster — what
makes her separable, what Rilke circled — is now a validated, confound-free number. Scope (honest): reads
the strongest eccentricity (groups across a void); adjacent-distributed fields (Matisse) read as one group —
a true structural fact (they fill the field; Saltimbanques opens a void), not a confound. Code:
compute_centricity (group-level default) + _void_between_groups.
rhythm #2 (figure mass-peaks) fails the gate — the boundary
Re-aimed rhythm onto the figures: detect sub-elements as mass-peaks INSIDE the figure mask, measure
spacing-regularity + syncopation. Fails the synthetic gate: a regular row of 5
blobs reads as 9 elements with regularity 0.089 — below random's 0.171; coarse smoothing did not fix
it. The painting numbers (Picasso syncopation 3.08, which looks like the seated-woman break) are
therefore not trustworthy — the gate caught it. Visual diagnosis: the |blur(L) − mean|
mass field peaks at figure rims, not centres (5 disks → 9 rim points), and on Saltimbanques the 17
"elements" are scattered mass-deviation hotspots, not the 6 figures.
Synthetic regular row — should find 5; finds 9. The mass field is bright at each disk's rim and dark at its centre, so peaks land on rims (top+bottom), and the figure mask fuses the row. Over-fragmentation, visible.Saltimbanques — 17 detected "elements" land on lit patches and edges, not on the six figures. No clean per-figure elements to give a rhythm.
The boundary, named: GROUP-level structure (void-separated groups, centricity, the load-bearing void)
is reachable and validated. INDIVIDUAL-figure structure (rhythm, directional gaze) needs occlusion-aware
per-figure separation or semantics — outside the deterministic, training-free contract. Same wall the
directional gaze hit. Operating envelope for multi-object reading: the void/group device — YES;
figure-level rhythm and gaze — NO (with contract-legal tools). This answers the original skepticism
honestly: one validated mechanism, two named limits.
THE PIVOT — ConsensusMask v1: consensus existence breakthrough
Russell's redirection dissolved the wall instead of climbing it: every failure in this whole trail was
a single construction trusted alone (gradient poles, tonal mass, the flow field, one edge-closed
mask) — each fell for a different artifact (frame, vignette, texture, rims). The third way between
"recover entities" and "never infer entities": recover stable attractors — regions that earn
existence by SURVIVING MULTIPLE INDEPENDENT DETERMINISTIC CONSTRUCTIONS. A region is real if the image
keeps reasserting it under different readings. Deterministic objectness, no semantics. ConsensusMask v1 (consensus.py): edge-closed · tonal-Notan · chroma-region, each at 5
blur scales. SUPPORT(pixel) = fraction of the constructions asserting it; consensus regions = connected
components of high support; CONFIDENCE = how many constructions agree. It works on the first build:
figures emerge as high-confidence regions and the ground/vignette/texture that fooled every single mask
fall to low confidence — without naming an object.
Saltimbanques — the support field lights the cluster + the seated woman and suppresses the sky/ground; consensus regions land on the figures (cluster 0.52, woman separate). The thing the image repeatedly insists on IS the figures.Rembrandt (chiaroscuro) — the lit face earns the highest confidence (0.55–0.72): edge gets the silhouette, tone gets the face, chroma gets the flesh, and where they agree is the focal point. The wall that broke every single mask, handled gracefully by agreement.
Caravaggio: lit figures 0.64–0.70, dark wall 0.33. The question changed from "can I segment?" to "what
does the image repeatedly insist is there?" — and v1 answers it. Confidence is GRADED, not binary —
the strength: it measures degree of insistence. Honest v1 scope: channels × 5 scales only (perturbation-
survival not yet in — the next strengthening); threshold/merging first-pass; touching figures still merge
into one consensus region (the cluster). This is the SUBSTRATE; the region graph + composition readout
sit on top.
ConsensusMask v2 — perturbation-survival added
Step 6, "the most important addition": re-run the constructions after geometry-preserving degradations
(grayscale · gamma↑/↓ · JPEG q40 · contrast-normalise); a region that survives perturbation earns more
existence, one that needs a single condition (a colour region that vanishes in grayscale, texture that
fragments under JPEG) loses support. Effect: modest but correct. Saltimbanques cleaned from 3
consensus regions to 2 — the cluster + the seated woman (a fragile third dropped); confidences
uniformly nudged down (degradation). The thin-edge frame still persists at low confidence (0.33), so
artifact rejection comes from CONFIDENCE-FILTERING (keep regions above ~0.4), not perturbation
alone — most of the signal was already in channels×scales. Code: consensus_mask(perturb=True).
Saltimbanques, perturbation-strengthened — the support field lights the cluster + the seated woman, ground suppressed; consensus settles on the two robust structures.
composition readout v1 inverts the key cases — corrected direction
Built the region graph + readout (composition.py) on confidence-filtered consensus regions.
It inverted the two cases that matter: Saltimbanques → SINGLE-CENTRE (the seated
woman scored 0.38, below the 0.42 threshold, so she was dropped — collapsing to one region); Rembrandt →
MULTI-CENTRE (its sub-parts cleared the bar and the dark gaps between them fired the void). Root cause:
(1) an uncalibrated confidence threshold drops real regions and keeps fragments; (2) the void-corridor fires
between ANY two regions, including intra-figure gaps (the bridge_cost confound reborn);
(3) I conflated raw consensus regions with figure-groups; (4) I skipped the synthetic gate.
A fixed threshold cannot separate "a real separate figure" (the woman, 0.38) from "a fragment of one figure"
(Rembrandt's part) — they have similar confidence; the difference is connectivity (the woman is a
void-separated component; Rembrandt's face+body is one connected figure), which a threshold discards.
Correction (not yet built): the consensus SUBSTRATE is right (support field lights figures); the
composition LAYER must sit on edge-led FIGURE-GROUP connectivity (validated: Rembrandt 1, Saltimbanques 2)
weighted by consensus confidence — not raw thresholded consensus regions — and pass a synthetic gate
first. Artifacts: outputs/composition_*.png (the broken v1, kept as provenance).
AblationImpact / indispensability — composition is CAUSAL keystone, validated
Russell's keystone: a region's importance is not its confidence, it is its NECESSITY. Remove it, recompute
the composition, measure what breaks (ablation.py, on the validated edge-led figure-groups —
connectivity, not a threshold). Outputs COMPONENTS, no verdict (the third repeat of the don't-classify-early
lesson). Synthetic gate PASSES and shows necessity ≠ redundancy: single mass → indispensability 0
(nothing depends on it); cluster+lone → 0.63 (each region load-bearing, removing either kills the void);
scattered-5 → 0.14 (remove one, four remain — redundant). Saltimbanques — the fix and the finding:
region
area
void_dependency
indispensability
cluster
0.586
0.21
0.2365
seated woman
0.010
0.21
0.2365
The woman is 1% of the canvas by area and EQUAL to the entire cluster in indispensability — remove her
and the void collapses exactly as much as removing the whole cluster. The region the confidence threshold
DROPPED (0.38) is the most structurally necessary thing per unit of herself. Necessity is not mass;
composition is causal, not spatial — which is also why bridge_cost failed (it measured the corridor, not
the dependency). Graph-theoretic equivalent: articulation points / betweenness centrality = indispensability.
Caveat: the other four works read n=1 (edge-led grouping merges touching figures), so ablation fires only for
VOID-SEPARATED multi-group compositions (precisely the Saltimbanques case); separating touching figures into
ablatable sub-units is the open edge.
Evidence-family weighting — belief, not democracy substrate
Russell: ConsensusMask averaged every construction, but edge/tone/colour/scale/perturbation are correlated —
5 blur scales of one operator are not 5 witnesses. consensus_families: ONE vote per evidence
FAMILY (edge · tone · colour · perturbation-survival); WITHIN the edge family, genuinely different operators
(Canny · DoG · morphological gradient) so agreement is real. Each region now carries a BELIEF VECTOR — the
disagreement kept, not collapsed — and n_families is honest evidence (no overcounting).
Saltimbanques — each evidence family asserts DIFFERENT regions. The colour family carries the seated woman where edge and tone stay dark; the cluster is multi-family; a tonal band shows in tone only. The kept disagreement, made visible.
The finding that explains the whole history: the seated woman's belief vector is edge 0.10 · tone
0.50 · COLOUR 0.96 · perturbation 0.43 — she is a COLOUR-built figure. That is exactly why every
luminance/edge reading lost her (gradient poles, tonal mass, the confidence threshold at 0.38). Weight the
families honestly and she is there. Contrast: the cluster is multi-family (n_fam 4, robust); a tonal band is
tone-only (n_fam 2); Matisse's gray field is colour 0.0 (neutral). The per-region belief vector IS the
disegno↔colorito fingerprint, local and un-collapsed. Caveats: "colour" = chroma presence (warm palettes
fire it — Rembrandt flesh 0.78), so read the relative profile; ~70s/image (optimise for corpus). Code:
consensus_families, render_families.
Assumption depth — the epistemic map crown (substrate)
Russell's deepest direction: structures are not equally "real". ASSUMPTION DEPTH = how many independent
evidence families must accumulate before a structure stabilises — the correct rebirth of the failed
"persistence" (not persistence under blur, but under EVIDENCE REQUIREMENT: demand ≥1 family, then ≥2, ≥3, ≥4).
assumption.py: per-pixel evidence count = the EPISTEMIC MAP; a depth ladder; per-region depth.
Works: Saltimbanques depth ladder 0.75 → 0.62 → 0.30 → 0.05 (demanding all four families leaves
only a 5% deeply-real core); the epistemic map shows deeply-corroborated cores (the figures) vs shallow
surrounds (the ground). Honest miss: the per-region depth came out flat (~3) because regions are found
at a support threshold FIRST (pre-selecting multi-family pixels) then measured — the refinement is to find
regions at EACH depth level independently (then the colour-built woman appears at depth 1 and dissolves by 3).
Saltimbanques epistemic map — every pixel coloured by how many independent evidence families agree (purple = shallow / one family, yellow = deep / all four). The figures are deeply corroborated; the ground is shallow. No single mask can draw this.
Evidence Diversity Index — how many construction logics the general result
Russell's leap: the belief vector is a COORDINATE, not a property — a region has a CONSTRUCTION SIGNATURE
(its spectrum, like spectroscopy), not an origin. Compare regions by ANGLE (how they are built). The image
question stops being "what is the object" and becomes "how many different construction logics does this
image hold at once." EDI = area-weighted angular spread of the construction signatures (diversity.py).
Hypothesis HOLDS, and sharper than expected:
work
EDI
reading
Caravaggio
0.014
multi-figure but ONE logic (tenebrist value everywhere) → coherent, stable
Rembrandt
0.118
one dominant value logic, everything agrees → strong single anchor
Matisse
0.119
tone-led, moderate
Picasso / Saltimbanques
0.217
cluster (balanced) · tonal band (tone-only) · woman (colour 0.96) — THREE logics at once → diverse, unstable
The finding: EDI separates "multiple objects, ONE logic" (Caravaggio — coherent) from "multiple objects,
MULTIPLE logics" (Picasso — diverse). The instability is the diversity of construction logic, not the object
count — exactly the hypothesis, and the thing no compositional measure could reach. This is the Parallax
variance-of-disagreement thesis at the evidence level, and it is IMAGE-GENERAL — it would run identically
on an MRI or an AI render; it asks nothing about portraiture or composition. Second axis — FRAGILITY (belief
concentration): Caravaggio redundant (0.01, survives losing any one family) vs Mondrian plane / Picasso tonal
band brittle (0.27–0.28, depend on one family). Coherent-and-redundant vs diverse-and-brittle = two ways to be
real. Caveats: tone fires broadly so EDI rides the edge/colour variation; 1–3 regions per work (needs finer
segmentation to be robust at scale); family-basis breadth is what keeps it honest (Russell's caution).
Construction signatures — same four evidence families, but Picasso's regions need very different mixtures (diverse evidence composition) while Caravaggio's are near-identical (one composition). The bars ARE the evidence diversity. (Language: composition, not "logic" — logic is inferred.)
Disentangled measurement vector + the regime plane measurement
Separated the conflated quantities into distinct measurements (measures.py): amount
(how much evidence) · agreement (do families concur) · certainty (how decisive) · fragility ·
plus local evidence contrast (adjacent-region signature difference) and deep_core. The
disentangling earns itself: agreement and certainty are genuinely separable — Mondrian lowest agreement
(0.67, families spread) but highest certainty (0.56, decisive); Matisse the opposite (0.83 / 0.28). The
depth × EDI regime plane spreads the works — Caravaggio (high deep_core, low EDI) opposite Picasso
(low deep_core, high EDI) — suggesting depth and EDI are anti-related, not independent (n=5, structural,
NOT a statistical claim). Artifact: outputs/regime_plane.png.
ARCHITECTURAL AUDIT — Evidence · Persistence · Dependency consolidation
Russell's directive: pressure-test the architecture, not add detectors; determine which quantities are
independent vs redundant projections. Independence audit (analytical overlap + n=5 sanity check) →
AUDIT.md. Finding: amount · agreement · fragility · deep_core form ONE axis (mutual |r| 0.81–0.96
at n=5; mathematically all functions of belief magnitude/spread) — call it corroboration. Collapses:
agreement+fragility → concordance; deep_core + assumption-depth → Evidence (depth = #families-agreeing IS
concordance, not the distinct persistence); void_dependency → a component of indispensability; group_count /
anchor_dominance / dispersion → ART-LAYER (not general primitives). The irreducible core (13 → ~5):
EVIDENCE (the belief vector → amount/concordance/certainty) · PERSISTENCE (perturbation-survival — the distinct
kind; depth is Evidence) · DEPENDENCY (indispensability change-vector). Key structural finding: Dependency
is ORTHOGONAL to Evidence — the Saltimbanques woman has low Evidence (1% area, single-family colour) yet high
Dependency (load-bearing). EDI = a 2nd-order Evidence statistic (variance across regions), empirically the most
independent. Generalisation audit: all retained quantities are art-free (identical meaning for pathology / SEM /
satellite / AI). Next: the corpus run, on the REDUCED set, to test independence (depth vs EDI) and signature
clustering by domain. See AUDIT.md for the full dependency matrix.
CORPUS TEST (n=25) — the audit re-grounded; "13→5" RETRACTED validation
Russell's correction: don't cut for minimalism — find GENUINE duplicates, otherwise RETAIN ("more survive,
for the wider picture"). Ran all 13 measures over 25 mixed works (East-Asian ink-on-void · Western oils ·
abstraction; corpus_run.py). The n=5 "one corroboration axis" did NOT survive real n —
amount and agreement are nearly orthogonal (r=+0.06): how MUCH evidence vs how CONCORDANT are two
different things. Only TWO genuine duplicates: agreement↔fragility (r=−0.92, definitional — both are
belief-vector spread) → merge to concordance; void_load_bearing⊂max_indispensability (r=+0.99,
containment) → nest, don't delete. The spine HOLDS: Evidence×Dependency cross-correlation max |r| =
0.44 — Dependency is a genuinely separate axis, not a projection of Evidence (the load-bearing woman).
Retracted: group/anchor/dispersion are NOT "art-layer" — they're general primitives, just dormant
here (88% zero: only 3/25 works have ≥2 groups). Disposition:
11 distinct + 1 merge + 1 nest, several dormant-but-general KEPT for breadth. Caveat: Dependency is 88%
dormant here, so orthogonality is directionally confirmed but under-exercised — needs a multi-mass corpus.
n=25 correlation (orange lines split Evidence ⟂ Dependency blocks): the cross-block is uniformly pale (max |r| 0.44 — the spine holds); each block is internally structured; the Dependency cluster is collinear because the corpus is single-mass-dominated (corpus-contingent, not definitional). Right: dormancy — local_contrast + the void cluster read 0 on ~88% of these works (kept for the wider picture, not cut).The three headline claims as scatters. LEFT amount vs agreement (r=+0.06) — a cloud: how-much and how-concordant are distinct, both kept (the n=5 read had wrongly fused them). MIDDLE agreement vs fragility (r=−0.92) — a tight anti-diagonal: one quantity by construction → merge to concordance. RIGHT void_dependency vs indispensability (r=+0.99) — the active points lie on a line (containment → nest), and the cluster at the origin is the 22 single-mass works where Dependency is dormant.
Per-work measurements (n=25), sorted by group count then EDI. Highlighted rows = multi-group (Dependency-active); 22/25 are single-group, which is why void between / max indispensability are mostly 0 here — dormant on this corpus, not failing. Labelled works carry era/culture in the filename (Sotoguchi 20c, Sin Wi 19c, Itchō c.1710).
work
amount
agreement
certainty
fragility
EDI
deep core
n groups
void between
max indispensability
J. 20th century mid Sotoguchi
0.533
0.729
0.446
0.127
0.319
0.028
3.000
0.738
0.542
P. 19th century early Sin Wi
0.532
0.702
0.514
0.155
0.233
0.010
2.000
0.700
0.707
C. 1710 Hanabusa Itcho
0.569
0.887
0.195
0.014
0.000
0.090
2.000
0.625
0.855
DT1027
0.476
0.743
0.417
0.124
0.443
0.043
1.000
0.000
0.000
DT234058
0.459
0.884
0.225
0.032
0.350
0.024
1.000
0.000
0.000
DT1883
0.486
0.752
0.433
0.138
0.345
0.077
1.000
0.000
0.000
Jilted
0.473
0.865
0.253
0.121
0.276
0.064
1.000
0.000
0.000
m110
0.558
0.890
0.213
0.026
0.214
0.205
1.000
0.000
0.000
Helen by a Chair
0.451
0.800
0.348
0.100
0.210
0.014
1.000
0.000
0.000
m109
0.442
0.915
0.193
0.051
0.210
0.029
1.000
0.000
0.000
DT2566
0.531
0.830
0.382
0.086
0.204
0.092
1.000
0.000
0.000
DP-12952-001
0.637
0.901
0.367
0.049
0.187
0.160
1.000
0.000
0.000
DT1500
0.436
0.834
0.335
0.092
0.170
0.020
1.000
0.000
0.000
m103
0.566
0.729
0.473
0.134
0.159
0.054
1.000
0.000
0.000
DP-14936-033
0.542
0.859
0.256
0.049
0.146
0.098
1.000
0.000
0.000
F. 18th century unkown2
0.496
0.892
0.196
0.027
0.140
0.112
1.000
0.000
0.000
DT1565
0.531
0.922
0.149
0.032
0.124
0.135
1.000
0.000
0.000
DT1879
0.485
0.913
0.166
0.023
0.082
0.037
1.000
0.000
0.000
DP-20101-001
0.485
0.795
0.344
0.087
0.067
0.061
1.000
0.000
0.000
DP170905
0.725
0.855
0.539
0.034
0.066
0.210
1.000
0.000
0.000
DT8475
0.498
0.941
0.101
0.008
0.059
0.085
1.000
0.000
0.000
DP232029
0.476
0.926
0.139
0.009
0.019
0.078
1.000
0.000
0.000
DP-17679-001
0.683
0.926
0.366
0.004
0.000
0.214
1.000
0.000
0.000
DP-20099-001
0.461
0.932
0.135
0.008
0.000
0.040
1.000
0.000
0.000
DT1943
0.526
0.915
0.148
0.009
0.000
0.117
1.000
0.000
0.000
CANON-3 SDI probe — a continuous field companion (and a self-correction) cross-validation
Russell surfaced the VTL_Kernel_Metrics … Canon_3_SDI notebook (a separate r_v package) and asked
if it helps the dormancy gap. It does — and it caught a wrong claim I had committed. Our void/dispersion
are discrete (need ≥2 figure-groups); Canon-3's r_v (void ratio = fraction of frame below a
gradient threshold) and SDI (mass spread from centroid) are continuous field measures, no
grouping. Probe on the same n=25 (canon3_probe.py): r_v/SDI are 0% dormant vs our 88% — active
on every work. And where ours is active (the 3 multi-group works) r_v tracks void_between (Sotoguchi
highest on both; r=+0.40), so the field version is cross-validated to trust where the discrete one is dormant.
r_v is also independent of the Evidence axes (vs EDI≈0). Self-correction: my earlier "honest miss" — that
the ink-on-void works did NOT activate the void measures — was WRONG. The three multi-group works ARE the
ink works (Itchō/Sotoguchi/Sin Wi), with the highest void_between (0.625–0.738); I'd mis-read the corpus
mean n_groups (1.16) as applying to them. The real dormancy is on the dense single-figure works.
Caveat: r_v is a single Sobel-gradient construction — the texture-amplifying path we left for Notan; it
will under-read void on a densely brushed oil. Recommended integration: port the concept (continuous
field void + dispersion) onto our texture-robust Notan mass field, not raw gradient — always-active AND
squint-robust. Drop ρ_r (correlates with amount, r=0.60).
multi-group work
n groups
ours: void_between
field: r_v
field: SDI
Sotoguchi (20c ink)
3
0.738
0.884
0.194
Sin Wi (19c ink)
2
0.700
0.775
0.260
Itchō (c.1710 ink)
2
0.625
0.847
0.224
r_v/SDI active on all 25 (0% dormant); these 3 are where the discrete void is testable — and the field reading agrees.
BUILT: texture-robust field-void / field-dispersion (Notan substrate) build + gate
Per the recommendation: ported the Canon-3 concept (continuous, grouping-free void + dispersion) onto our
texture-robust Notan mass field — NOT the raw gradient (field.py). Mass = multi-scale
|squint(L)−median| + chroma; median baseline (not mean) so a uniform ground reads as
void and figures as deviation (the mean fix: a figure-pulled mean had turned empty ground into spurious mass,
failing the gate — corrected). Synthetic gate first (field_gate.py, predictions locked):
field_void flat≈1 > centered > full≈0; field_dispersion scattered > centered; and the decisive
texture test — smooth vs textured blob: Notan field_void Δ=0.013 vs gradient r_v Δ=0.087
(gradient 6.7× more fooled). ALL GATES PASS. On the n=25: field_void/field_dispersion are 0% dormant
(vs 88% discrete), cross-validate with discrete void at r=+0.74 (beats the raw-gradient probe's +0.40), and
are independent of Evidence (vs EDI −0.17). The three ink works top the field_void ranking. Wired into
measure_all; discrete void_between KEPT (load-bearing between-group reading where ≥2 groups exist),
field measures carry void/dispersion continuously everywhere else.
LEFT the dormancy fix — field_void (Notan) is active across all 25 works (orange = the 3 multi-group / ink works, teal = the rest); discrete void_between (purple) fires only on 3. MIDDLE the synthetic gate's texture test — Notan field_void barely moves smooth→textured (Δ0.013) while gradient r_v collapses (Δ0.087): this is why we build on Notan, not gradient. RIGHT the Notan mass field on Itchō (c.1710 ink, highest field_void 0.627) — bright = mass (tiger, waterfall, tree), dark = the load-bearing void the discrete reading only saw because it happened to split into 2 groups.
PERSISTENCE surfaced (the third leg) + first non-art generality test build + gate + cross-domain
Of the three principles, Evidence and Dependency were exposed; Persistence was buried inside consensus
as one family vote. Surfaced (survival.py, in measure_all): degrade the image
(grayscale / gamma± / JPEG / contrast-norm), re-read the Notan mass structure, measure how much survives
(magnitude-aware energy overlap). Outputs persistence, persistence_min, and
fragile_to — which degradation hurts most = what the structure is built from.
Distinct from Evidence: a colour-built region is well-supported yet grayscale-fragile (the woman:
edge 0.10, colour 0.96). Synthetic gate (survival_gate.py, locked): equiluminant colour
blob survival 0.00 under grayscale, fragile_to=grayscale; tonal blob survives (1.00); noise bounded.
All pass. (First pass used a magnitude-BLIND correlation that missed the collapse — it locked onto the
disc-shaped numerical residual and reported ≈1; corrected to energy-overlap. Two gate bugs caught and fixed:
mean→median baseline, correlation→energy-overlap.) First generality test (cross_domain.py):
ran the full instrument on H&E pathology tiles (NCT-CRC adipose/stroma/tumour, 224×224) + art, no domain
knowledge — 12/12 clean, no crashes. Locked prediction HELD: H&E is stain-defined (chromatic), so it
should read colour-built → 8/9 tiles fragile_to=grayscale, the SAME signature as the woman and Mondrian's
colour blocks. The instrument reads what a structure is built from identically in a Picasso and a tumour
section. First empirical support for the domain-general thesis.
Persistence signature — survival of structure under each degradation (cyan box = fragile_to). Pathology (above the orange line) is uniformly fragile to GRAYSCALE (the grayscale column is the weakest cell) = colour/stain-built; art (below) is varied — Picasso fragile to gamma_down, Caravaggio to gamma_up, Mondrian to grayscale (its colour blocks). The instrument names what each structure depends on with NO domain knowledge — the generality thesis, first evidence.
Does it DISCRIMINATE tissue classes? (honest screen, not just "it runs") discrimination
Russell's question: the 9-tile run tested generality, not class prediction — does it actually separate classes?
Proper screen (disc_screen.py): 9 NCT-CRC classes × 25 tiles, cheap measures only (field-void/
dispersion + persistence + the 5 per-degradation survivals = 9 deterministic features, NO consensus), 5-fold CV.
LDA 66.2%, kNN 67.1% — vs 11.1% chance (~6×). The measures carry real tissue signal. Honest reading:
(1) it's a screen, not a benchmark — published CNNs on the full 100k-tile set hit ~94–99%; 66% from 9 hand-built
deterministic features is far above chance but not competitive, nor meant to be. (2) field_void carries most of
it (ANOVA F=57): it reads tissue CELLULARITY/density — BACK (empty) 0.95 void, NORM 0.145, LYM 0.232 (dense
nuclei). (3) Persistence helps (surv_gamma_down F=40, surv_grayscale F=27). (4) field_dispersion is useless
here (F=4.5, all ~0.28): tiles are textures, not compositions — the composition-geometry measures correctly
find nothing to read. So it's the DENSITY + PERSISTENCE measures that transfer, not the compositional ones —
exactly what you'd expect, and an honest negative for the composition side. STR (stroma) is hardest (36% recall,
confused with muscle/tumour — structurally similar fibrous tissue); BACK 96%, NORM 88%, LYM 76%.
LEFT LDA projection of the 9 deterministic measures — BACK (empty) separates cleanly; the cellular tissue classes cluster with overlap. RIGHT 5-fold-CV confusion (row-normalised): strong on BACK/NORM/LYM, weak on STR (smears into MUS/TUM). 66% vs 11% chance from measures never designed as a tissue classifier — the structural measures are tissue-informative, density and degradation-survival doing the work.
Does EDI / the Evidence measures help? (A/B vs the cheap set) ablation
Russell's pathology book hits 91.5% nine-class (CRC-VAL-HE-7K, zero trained params) — but it has H&E
colour-deconvolution + multi-scale GLCM texture + nuclear-blob layers Saltimbanques doesn't carry. He asked: add
EDI, see if it helps. Ran full measure_all (consensus EDI + all Evidence) on 180 tiles (20×9),
5-fold CV (disc_screen2.py):
feature set
9-class acc
vs chance 11.1%
cheap (field + persistence, 9f)
64.4%
5.8×
cheap + EDI only (10f)
66.1%
+1.7pp
cheap + ALL Evidence (16f)
73.9%
+9.5pp
Evidence only (7f)
43.3%
weaker alone
Honest read:EDI specifically adds almost nothing (+1.7pp; ANOVA F=7.2, rank 13/16 — near the
bottom). That is correct: EDI measures evidence-DIVERSITY across regions = a composition property, and a
tissue tile is uniform texture with one construction logic — nothing for EDI to read. The composition measures
(EDI, field_dispersion, local_contrast) sit at the bottom of the ranking — the instrument does not manufacture
signal that isn't there. But the full Evidence set lifts +10pp to 73.9%, driven by amount
(F=65, the single strongest feature — evidence magnitude ≈ structural richness/density) and deep_core
(F=28, all-family agreement). So the evidence-MAGNITUDE/coherence measures transfer to tissue; the diversity/
composition ones stay (correctly) quiet. Still well short of the purpose-built 91.5% — we lack the H&E /
GLCM / nuclear-blob layers — exactly as expected.
LEFT the A/B: EDI alone +1.7pp, the full Evidence set +10pp (to 73.9%), all far above chance but below the purpose-built book's 91.5%. RIGHT feature discriminative power (ANOVA F): amount + field_void (density) + degradation-survival carry the signal; EDI (cyan) ranks 13/16 — the composition measures correctly find little to read in uniform tissue texture.
Waking the core: composition geometry on CONSENSUS regions build + gate
Back to the instrument's actual job. Its signature axis — Dependency / relational composition (void,
indispensability) — was dormant on the works it was built for: edge-led grouping merges most paintings
into one figure-group, so the axis fired on only 3/25. Fix (no new principle, no semantics): run the SAME
indispensability machinery on the consensus regions we already extract — median 4 regions/work (21/25 with
≥2) vs edge-led's median 1. Added composition_on_consensus (shared _graph_components;
edge-led reader untouched). Synthetic gate (region_gate.py, locked): single→0.000,
cluster+lone→0.583 (lone mass across the void is load-bearing, matches the old gate's 0.63), scattered→0.214.
All pass. On the art n=25: Dependency axis active 21/25 vs 3/25 (7×). Sanity-checked the top reading —
DT1565 is Degas's Rehearsal of the Ballet Onstage, a textbook multi-figure composition edge-led merged into
ONE blob (dormant); consensus reads 5 masses, indispensability 0.731. Real composition, not fragmentation.
Honest limit: the acceptance test wanted "wake more works AND agree with edge-led where both fire" — first
met decisively, second inconclusive (only 2 works fire on both; they rank oppositely, r=−1 at n=2 =
uninterpretable). The two are different-granularity decompositions (coarse perceptual figures vs finer evidence-
regions), not required to agree; the synthetic gate + Degas spot-check are the validation. BOTH readers kept.
Max indispensability per work: consensus-region (cyan) fires on 21/25; edge-led (purple) on 3. The Degas ballet — a genuine multi-mass composition — was fully dormant under edge-led and is now read (5 masses, 0.731). The core relational axis, finally active across the corpus it was built for.
THE LABELED TEST — does the reading mean something? Movements separate labeled corpus
The unlabeled n=25 tested independence/dormancy but not meaning. Built a labeled corpus from the Met
Open Access API (free, curl-downloadable; AIC's movement labels are cleaner but its image CDN is
Cloudflare-gated): 96 public-domain paintings, 3 multi-artist movements — Baroque 35 (Rembrandt/van Dyck/
Hals/Poussin), Impressionism 24 (Degas/Renoir/Manet/Pissarro), Post-Impressionism 29 (Cézanne/van Gogh/Gauguin/
Seurat/Lautrec). Multi-artist per movement → movement decoupled from artist. Result: the measures separate
movements, LDA 5-fold CV = 65% vs 33% chance (deterministic, no training). And the WAY they separate is
art-historically real: the Persistence fragile_to signature reads the tonal→chromatic transition
— Baroque 71% tonal-built (fragile_to=gamma_up, structure in the darks), Post-Impressionism 83% colour-
built (fragile_to=grayscale), Impressionism between. That IS the textbook Baroque-tenebrism → Post-Impressionist-
structural-colour story, measured not told, no labels. Carried by field_void (F=35), amount (F=30), persistence/
certainty. Honest: the relational COMPOSITION axis just built (comp_indisp/void) does NOT separate movements
(F≈1.2) — it tracks multi-mass-ness, which all three have; the signal is what-structure-is-built-from + density,
not layout. Confound flagged: field_void may be partly age/varnish (Baroque darker) — but the tonal-vs-colour
fragile_to signature is about where structure LIVES, far less age-confoundable, and is the robust finding. First
look, n=88, 65% modest — but the reading is not noise; it tracks a real art-historical axis.
LEFT the Persistence signature by movement: a clean tonal→chromatic gradient — Baroque 71% fragile-to-brightening (tonal/chiaroscuro, orange) → Post-Impressionism 83% fragile-to-grayscale (colour-built, cyan), Impressionism intermediate. The instrument reads "Baroque structure is tonal, Post-Impressionist structure is chromatic" from perturbation-survival alone, no domain knowledge. RIGHT LDA of the full measure set — the three movements separate (65% CV vs 33% chance); Baroque (orange) left, Post-Impressionism (cyan) right, Impressionism bridging.
BRIGHTNESS CHECK — movement is a TONAL phenomenon (a fidelity result, not a flaw) reframed Correction (Russell): the earlier "confound / deflation" framing was wrong — brightness is not a flaw to
apologise for. Tested whether brightness drives the movement separation (met_confound.py): the
movements are a tonal gradient (mean luminance Baroque 0.16 → Impressionism 0.30 → Post-Imp 0.44;
field_void↔luminance r=−0.73), and mean luminance alone classifies movement at 70.5%, at/above
the full instrument's 64.8%. The right reading: tonal/value structure is the compositional backbone — value
organises nearly every painting before colour or arrangement — and these movements differ tonally by design
(Baroque tenebrism dark → Impressionism/Post-Imp high-key). So "movement" here is a tonal distinction, and the
instrument's tonal layer (field_void, amount, certainty, the tonal-vs-chromatic fragile_to) correctly reads it.
That brightness is near-sufficient is the instrument telling us what kind of structure the label is (tonal),
not a deficiency. The relational-composition measures (indispensability/void) add nothing beyond tonal here
because there is no non-tonal signal in the label — and they DO add where the label needs it (pathology, +14pp).
Not "averaging the pixels beats the instrument"; rather "this label is tonal, the instrument reads tonal correctly,
and the rest of the engine rightly stays quiet."
LEFT the movements are a tonal gradient — nearly separable by mean luminance alone (Baroque dark → Post-Imp high-key). RIGHT brightness alone (70.5%) is at/above the full instrument (64.8%): the label is tonal, so the instrument's tonal reading carries it and the relational measures correctly add nothing beyond it. A fidelity result — tonal is the backbone, read correctly — not a flaw.
WHERE it adds value — the fair test (the real conclusion) resolution
The question: does the engine contribute structure beyond the tonal backbone, and where? Tested both tasks
the same way — trivial colour/intensity vs structural-alone vs combined (pathology_baseline.py +
movement combined): MOVEMENT — tonal 70.5 / structural 64.8 / combined 69.2 → no lift beyond tonal,
because the label IS tonal (no non-tonal signal for the relational measures to add). PATHOLOGY — trivial
68.4 / structural 66.2 / combined 82.2 → +14pp, structure COMPLEMENTARY (the label needs non-tonal structure).
Neither wins alone on pathology (66 vs 68); their complementarity is the value. The honest characterisation:
the engine reads the tonal backbone correctly everywhere, and contributes additional non-tonal structure exactly
where the label requires it. Value-add tracks what kind of structure the label is — not a win or a failure,
a precise map of when the relational layer matters.
Trivial baseline vs structural-alone vs combined, both tasks. Pathology: combined (cyan) lifts +14pp over either — the label needs non-tonal structure, so the measures add it. Movement: combined ≈ tonal-alone — the label IS tonal, so the relational measures correctly add nothing beyond the tonal backbone the instrument already reads. Value-add tracks what kind of structure the label requires.
PER-WORK VERIFICATION — does it see what it should? + a region-threshold blind spot audit
Russell's pull-back: stop chasing classification, LOOK at individual works — is the instrument seeing the composition
it's meant to? Built a per-work diagnostic (diag_render.py: original | Notan mass field | consensus
regions labelled with indispensability) and read 5 known works instead of trusting aggregate stats (the standing
rule I'd drifted from). What's SOLID (verified by looking): the Notan mass field tracks the real masses;
fragile_to reads correctly per work — Manet Dead Christ tonal/centric (the luminous body IS the
mass, fragile_to=gamma_up), Cézanne Card Players + the Gauguin-style still life colour-built multi-mass
(fragile_to=grayscale, 6 and 4 masses with high void), Degas ballet multi-mass distributed. The mass/tonal/colour
layer measures what it should. (Also: the instrument read the still life correctly as multi-object — it was MY
label that was wrong, not the reading.) BLIND SPOTS found by looking: (1) Region segmentation is
threshold-sensitive AND non-monotonic — Manet reads 1 region (centric) at the default thr=0.34, but 3
regions (Christ + 2 angels, indisp 0.57) at thr=0.22, then collapses back to 1 at thr=0.16 (everything merges
by connectivity). So n_regions / void / indispensability ride on a threshold with an un-validated, non-monotonic
sweet spot — the per-work composition COUNT is not robust. (2) Subordinate / low-evidence masses drop below the
default threshold (the angels). (3) Mass-scale, not figure-scale: the crowded crucifixion → 3 coarse mass-bands,
not 30 figures (by design / the known boundary). Verdict: the mass-field + persistence (fragile_to) +
field-void layer is trustworthy and reads per-work correctly; the DISCRETE composition layer (regions/
indispensability) is threshold-fragile and was under-verified — aggregate stats hid it, looking exposed it.
Recommendation before trusting per-work indispensability: region-PERSISTENCE across threshold (keep only
masses stable over a threshold sweep) — the same persistence principle, applied to segmentation.
Manet, Dead Christ with Angels: the Notan mass field (centre) correctly makes the luminous body the dominant mass; consensus reads it as 1 centric region at the default threshold — defensible, but the angels (darker/chromatic) drop out, and at thr=0.22 they reappear as 2 load-bearing masses. The composition count is threshold-dependent.Cézanne, Card Players: 6 regions tracing the seated figures and table, high void (0.55) between them, fragile_to=grayscale (colour-built). A multi-figure composition read correctly as multi-mass — the kind of reading the instrument exists for.
FIX: region-persistence — the threshold blind spot, repaired build + gate
Built the fix Russell greenlit (persistence_regions.py, composition_persistent). A mass is a
MAXIMUM of the support field; its dynamics (basin depth before merging into a taller maximum) is its
persistence — 0-D persistent homology. Select masses with dynamics ≥ h (h-maxima) + watershed to the
nearest persistent mass. The knob h (how deep a basin must be to count) replaces the arbitrary
threshold, and the count is monotonic non-increasing in h by construction. Validated
(region_persist_gate.py): synthetic single/cluster+lone/scattered → 1/2/5, stable across the whole
h-range. The Manet that flickered 1→3→1 across threshold is now monotonic [9,6,6,4,3,3] across h and
finds Christ + angels (≥3 masses) at 6/6 h-levels vs threshold's 1/4. At default h=0.15 → 4 masses (Christ +
both angels separated), void=0 (correct — a tight cluster, not void-separated), indisp 0.34. The blind spot is now
a measured, monotonic quantity; composition_persistent is the recommended per-work reader, and h itself
is the honest output (report count vs persistence-depth, not one number).
Manet, before/after. OLD single-threshold (red): 1 merged region, angels dropped, non-monotonic across threshold. STABLE persistence h=0.15 (cyan): 4 masses — the Christ body and both angels separated, markers on each — monotonic and stable in h. The Persistence principle, applied to segmentation itself.
The h knob + per-work mask gallery (what the masks actually produce) diagnostics What does h change? h = the minimum basin DEPTH (dynamics) a support-field maximum must have to count as its
own mass. It does NOT move masses or alter the field — the mass locations are fixed by the field's maxima;
h only sets how prominent a local maximum must be to be promoted to its own region vs absorbed into a taller
neighbour. LOW h = finer (more sub-masses: a figure splits into head/limbs); HIGH h = coarser (only
dominant masses; subordinate ones merge in). Monotonic non-increasing in h by construction (raising h never adds a
mass). So h is a grain / level-of-analysis knob, not a spatial one — read the work at major-mass scale or
sub-mass scale by choosing h, and report the count as a function of h, not one number. Honest grain note: the
default h=0.15 reads FINELY on busy works (Degas dancers → 15 masses, Cézanne → 9); for major compositional
masses, h≈0.20–0.26 (Manet → 3 = Christ + 2 angels). The masks land on real structure at every h (verified by eye);
the question is only grain.
The h knob, Manet across h=0.08→0.26: 6→5→3→2 masses. As h rises, sub-masses (a figure's parts, each angel) merge into their dominant parent; the locations never move. Coarse↔fine, monotonic — pick the grain for the question.Per-work diagnostic (original | Notan mass field | persistence-stable regions + indispensability), Degas Dancers: at h=0.15 it reads finely (15 masses, void 0.61) — the dancers, tutus and ground each a mass. Real structure, fine grain.
The OTHER images the masks produce — the per-evidence-FAMILY belief fields for Cézanne's Card Players: edge / tone / colour / perturbation each fire differently (the disagreement the belief vector keeps), then the family-weighted consensus. A colour-built mass lights the colour family and little else; the consensus is where they agree.
Rendered 10 per-work diagnostics with the stable reader (diag_render.py) + family galleries
(mask_gallery.py) + the h-sweep (hsweep_render.py). Per-work visual verification is now a
standing tool, not an afterthought.
Construction-singularity — built, gated, and an HONEST NEGATIVE on the woman honest negative
The full readout ranked the seated woman LAST on indispensability (small, area-biased). Built
singularity.py to fix it: a region's singularity = the drop in unweighted evidence-
diversity when ablated — area-independent, rewarding a region that is the SOLE carrier of a construction (the
Evidence axis contributing to Dependency). Synthetic gate PASSES (singularity_gate.py): a SMALL
unique-colour disc among 3 redundant tonal discs reads spatial 0.13 (low) but singularity 0.18 (highest) → ranks
#1 by load_bearing despite its size; redundant discs read NEGATIVE singularity. But on the real
Saltimbanques it does NOT crown the woman, at any grain. Coarse (3 regions): she's only marginally most-unique
(0.213 vs 0.206 — the upper figure is equally extreme/tonal), and her tiny spatial keeps load_bearing last (0.10 vs
0.24). Fine (8 regions): "colour" isn't unique anymore (several colour sub-regions), uniqueness dilutes, big regions
dominate. The gate passed because it was the idealised "many identical + one unique"; the painting is "a few
genuinely distinct masses." The real finding: THREE structural axes now fail to crown her — indispensability
(extent), singularity (uniqueness), spatial. Her counterweight status (Rilke/Arnheim) is most likely a
balance / leverage property — a small mass at large distance holding eccentric equilibrium (torque, not
extent or uniqueness) — and none of the current measures encode balance. Kept singularity (valid + gated;
it correctly reads the cluster as redundant/average, −0.108) — it is NOT the woman-detector. Next (recommended):
a balance/leverage measure (change in compositional imbalance on ablation), carefully designed + gated — OR the
honest boundary that her necessity is a perceptual-balance claim the structural-evidence instrument doesn't measure,
while it correctly characterises her as the unique colour mass.
BALANCE / leverage — the counterweight axis CROWNS the woman validated
The clear next hypothesis: her role is the eccentric COUNTERWEIGHT (small mass, large distance = leverage), not
extent or uniqueness. Built balance.py. Visual balance is a sum of moments about the frame centre,
M = Σ wᵢ·rᵢ — Rilke's "many-digited sum"; a balanced field is where it solves toward zero, |M| is the
imbalance. A region is a load-bearing counterweight when removing it makes the sum stop cancelling:
balance_dependency(i) = |M − wᵢrᵢ| − |M| (positive = removal unbalances = it was holding the field; negative
= heavy side). Synthetic gate PASSES (balance_gate.py): counterweight signs correct, symmetric
pair both-positive-equal, centred mass ~0, leverage amplifies a distant counterweight. On the real Saltimbanques
the woman is CROWNED — the sole counterweight. Plain AREA weighting: woman bd +0.0116 (only positive region),
cluster −0.006, upper −0.008. Leverage weighting: woman +0.028, still #1. A 2%-area figure is the balancer —
her DISTANCE (0.55, largest moment arm) makes her term decisive. After indispensability (extent) and singularity
(uniqueness) both failed, the balance axis succeeds for the art-historically right reason: she is the term
that helps the many-digited sum solve toward zero — remove her and it no longer cancels. Dependency now has three
facets and the woman separates them: load-bearing by extent (last), by construction (shared), by COUNTERWEIGHT-balance
(#1, alone). Honest caveat (Russell): "load-bearing" here IS a definition (increase in a moment-sum's distance
from zero); that she reads counterweight even under minimal AREA weighting (not only the Arnheim leverage premium)
makes it robust, not a definitional artifact — but three definitions were silent on her and one speaks. The
convergence of this measure with the century-old reading is the finding, held as a lens that illuminates imperfectly,
not the object itself.
Visual balance as the moment sum about the frame centre (+). Cyan = counterweight (removal increases imbalance); red = heavy side. The seated woman (2% area, far lower-right) is the SOLE cyan counterweight — the term holding the sum toward zero. The standing cluster and upper figures are the heavy side it balances.
earlier → diagnostic screen
Rather than guess a third statistic, screen candidate centricity scalars against the
four clean controls and keep only what separates Rembrandt-centric from Saltimbanques-
eccentric without echoing the chiaroscuro reference. Results below.
3. Calibration — clean controls
Persistence rank-orders by tonal contrast, not composition — the falsification table.
Work
role (art-historical)
persistence
bridge_cost
tonal character
Mondrian — Composition II
distributed but settled
0.500
0.930
flat, hard-edged
Caravaggio — Calling of St Matthew
directional
0.667
0.678
tenebrism
Picasso — Family of Saltimbanques
eccentric / metastable
0.667
0.877
flat, soft (Rose)
Rembrandt — Self-Portrait (1660)
centric (single figure)
0.750
0.771
deep chiaroscuro
Expected by composition: Rembrandt most centric, Saltimbanques most eccentric.
Observed by persistence: the reverse — it follows chiaroscuro (flat Mondrian lowest →
chiaroscuro Rembrandt highest). persistence falsified as a centricity measure
Saltimbanques full readout. Top row — the heartbeat: Notan mass field, the collapse cascade, and the persistence curve (the hump). Bottom — the L6 island density / island map / resistance graph. The persistence/ECCENTRIC verdict shown here is the confounded measure (it tracks chiaroscuro); the trustworthy reading is the figure-ground work in §4b/§6.
no candidate survives — each passes the narrow
Rembrandt-vs-Saltimbanques test but fails the 4-work sanity check, each with a diagnosed confound.
metric
Rembrandt (centric)
Caravaggio (direct.)
Mondrian (settled)
Saltimb. (eccentric)
verdict
chiaroscuro (L std) — confound ref
0.078
0.109
0.290
0.130
reference
persistence
0.750
0.667
0.500
0.667
= chiaroscuro
mass_conc (lo=centric)
0.514
0.660
0.685
0.607
= coverage/vignette
anchor_dom (hi=centric)
0.251
0.420
0.396
0.214
gradient frag.
convergence (hi=centric)
0.863
0.767
0.826
0.792
flat / noise
mass_conc ranks Mondrian (full-bleed) most-spread and Saltimbanques as
concentrated → measures coverage, not centricity. anchor_dom calls Caravaggio/Mondrian
more single-anchored than a single-figure portrait → gradient fragmentation. convergence
nearly flat, directional Caravaggio wrongly lowest. Two structural lessons: (1) the
extremes separated only via coverage/vignette — Rembrandt is a vignetted single figure,
which almost anything separates from a spread scene, so the 2-point test is too weak; (2) four
paintings co-vary every factor at once (contrast, coverage, count, spread) — you cannot
validate a coordinate on stimuli where the confounds travel together. Caravaggio and
Mondrian are likely distinct modes, not midpoints on one axis.
Net: we do not yet have a single validated structural coordinate for
composition. persistence, bridge_cost, mass_conc, anchor_dom, convergence — all confounded.
4b. Channel exposure — the path forward positive
Russell's reframe (looking at the artifact): the painting exists as blobs that correlate
and separate; a low-threshold Canny (~15–25%)defines the figure silhouette; then
a tonal or colour game characterises what is inside. The artist is hiding
— a grouping survives in at least one of the three channels (edge / tone / colour), and which
one carries it is diagnostic. Rendered the six-channel diagnostic (channels.py) for
three very different works. It works, and which channel carries the grouping varies by work:
Rembrandt — the low-Canny silhouette holds the figure whole; tone fills the internal cascade; colour is inert (monochrome). Edge + tone carry it.Picasso — colour cracks the fused clump (harlequin/jester/boy); the void map surfaces the woman↔cluster load-bearing gap.Matisse — colour builds the flat zones; the neutral gray field reads as the load-bearing void. Colour-primary.
channel
Rembrandt (chiaroscuro single figure)
Caravaggio (tenebrist directional)
Picasso (clumped figures)
low-Canny silhouette
holds figure whole (interior cascade absent → no fragmentation)
defines lit figures
outlines clump + woman
tonal (Notan)
fills internal cascade
lit-vs-dark band
dark figures vs ground
colour (hue×chroma)
inert (monochrome)
muted
cracks the clump (harlequin/jester/boy by hue)
void map
merges dark figure into ground (chiaroscuro limit)
dark wall
shows the load-bearing gap (woman ↔ cluster)
The measure that falls out — perceptual figure-ground grouping, not a tonal
scalar: (1) low-Canny silhouette + confirmed edges (L∩C) bound the figures — this solves
the Rembrandt chiaroscuro confound (outline captured even when interior is dark, where mass
field and void map both failed); (2) void separates figure-groups from ground and surfaces the
load-bearing gap; (3) colour subdivides a fused clump. Centric↔eccentric = number of figure-groups
+ void mediation (Rembrandt 1 = centric; Picasso cluster+woman across a void = eccentric;
Caravaggio directional). This is Gestalt figure-ground (perceptual organisation), NOT object
recognition — edge-bounded, void-separated, colour-subdivided regions, no learned weights, no
semantics. It stays inside the deterministic/training-free contract; the figure-ground tension is
resolved. Artifacts: outputs/channels_*.png.
Painter-strategy fingerprint — which channel carries the grouping (5 works)
An emergent result: the carrying channel is not noise, it is a signature of how
the work is built. Added Matisse, The Piano Lesson (1916) as a colorist-flat test.
work
carrier channel(s)
reading
Rembrandt — Self-Portrait
edge + tone
silhouette holds the figure whole; value fills the interior; colour inert (monochrome)
Caravaggio — Calling of St Matthew
edge + tone, directional
tenebrist value band; the light vector organises it
Picasso — Saltimbanques
colour + void
colour cracks the fused clump; the void separates the woman from the cluster
Matisse — The Piano Lesson
colour + void (gray field)
colour builds the flat zones; a neutral gray field mediates; clean geometric edges
Mondrian — Composition II
colour + hard edge
flat colour planes on a black-line scaffold
This is the disegno↔colorito axis, measured. Rembrandt/Caravaggio build with
line-and-value (Florentine disegno); Picasso/Matisse/Mondrian build with colour (Venetian
colorito). A 500-year-old art-historical distinction falls straight out of "which channel
carries the grouping." Caveat: Matisse's "ECCENTRIC, persistence 0.667" verdict is right for
the wrong reason — the work genuinely is distributed, but the confounded persistence measure
hit it by luck, not by reading centricity. The trustworthy signal is the channel profile, not
the scalar.
5. What we have learned
Texture/contrast is the recurring confound. Gradient pole-counts, saliency
("restlessness"), and tonal pole-counts ("persistence") all collapse into how much
contrast/texture is present, not how the composition is organised. The Notan move
fixed it for synthetic blobs but not for chiaroscuro real paintings.
Centric/eccentric is an object-level / figure-ground property. We have been
trying to read it from tone alone. Every low-level tonal scalar so far re-tracks tonal
distribution. This presses on the VTL no-semantic-inference contract — the open
design question.
Controls earn their keep. A contaminated image gave a flattering number; a clean
single-figure control falsified the headline measure. Both were caught only by looking and
by testing the simplest case first.
The relational void (bridge_cost) is real but mis-aimed — it finds
maximal low-density corridors, which a Mondrian grid wins. It needs to be conditioned on
"between two figure masses," not any two islands.
6. Where we are going
The pivot reframes the whole program. The boundary below was mapped under the old
"measurement → scalar" approach. ConsensusMask (above) changes the substrate: instead of measuring the
image, ask what regions the image repeatedly insists on under independent constructions. That gives
a deterministic region substrate without semantics — and the relations (anchor dominance, void mediation,
single- vs multi-centre) get measured between consensual structures, not detected objects. The
architecture:
Image → channel masks (edge/tone/chroma) → scale + perturbation survival → CONSENSUS REGIONS → region graph → composition readout
Status & next builds:
Perturbation-survival — DONE (consensus_mask(perturb=True)). Modest: drops fragile
regions (Saltimbanques 3→2 = cluster + woman). Artifact rejection is via confidence-filtering, not
perturbation alone.
Composition readout v1 — FAILED (raw thresholded consensus regions inverted Saltimbanques/Rembrandt)
→ corrected by AblationImpact — DONE & validated. Composition is CAUSAL: a region's importance is
its NECESSITY (ablate it, recompute, measure change), not its confidence. Synthetic-gated; Saltimbanques'
seated woman (1% area, dropped by the threshold) reads EQUAL indispensability to the whole cluster —
load-bearing by necessity. ablation.py.
Evidence-family weighting — DONE (consensus_families). One vote per family; per-region
BELIEF VECTOR keeps the disagreement; the Saltimbanques woman revealed as colour-built.
Evidence Diversity Index — DONE, the general result (diversity.py). EDI = angular
spread of construction signatures: Caravaggio 0.014 (one logic) → Picasso 0.217 (many logics). The instability
is construction-diversity, not object count. Image-general (the Parallax variance-of-disagreement thesis at the
evidence level). + fragility (brittle vs redundant).
Remaining:
RelationSurvival — does the VOID (and the anchor relation) survive perturbation, not just the regions
— the causal layer's missing piece.
Robustness: finer/per-level region extraction (sharpen assumption depth + make EDI robust at 1–3
regions); broaden the family basis (a true hue-opponency operator; topology family) without losing honesty.
The range & the generalisation: Bamiyan murals → Giotto → Boafo; and — since EDI is image-general —
test it cross-domain (AI renders, pathology) where "how many construction logics" is not about composition at all.
Standing lesson reinforced: synthetic-gate every new coordinate before paintings.
Composition v1 skipped it and inverted the key case; the paintings caught it, but the gate should have.
The consensus substrate is right (support field lights figures); only the composition layer's region-
definition was wrong (raw regions + threshold instead of figure-group connectivity + consensus weight).
The boundary as mapped under the prior approach (for the record):
reachable + validated
blocked (boundary)
figure-ground groups (step 1)
separating touching/clumped figures into individuals
centricity / void-separated groups
figure-level rhythm (beat, syncopation)
load-bearing void between groups
directional device / gaze (semantic)
channel fingerprint (disegno↔colorito, observed)
—
Two honest forward options:
A — Consolidate (recommended). Assemble the explanation readout from what is
validated: figure-groups + centricity + load-bearing void + the channel fingerprint → a structural
profile that articulates how a work is built and states the boundary explicitly. Point it at
Saltimbanques: it can say, validated, "two groups, one load-bearing void, no single anchor —
the woman is separable, the field does not resolve to one centre." That is the mission, answerable
now, without overclaiming the figure-level it cannot reach.
B — Invest in per-figure separation. Take on occlusion-aware figure separation (a real
open CV problem) to unlock rhythm. High effort, uncertain it stays contract-legal. Only if individual-
figure rhythm is essential to the goal.
Then, either way — the range: Bamiyan murals → Giotto → Boafo, to test the validated
coordinate across the full stylistic span, and the Phase-3 compositional-category basins.
Honest net: the original skepticism ("not convinced it can be done on either
account") is answered — one structural device (the void) is identifiable and validated; two (rhythm,
gaze) are named limits. A partial yes with mapped edges, not an overclaim.
Retractions on record: calibration run #1 Saltimbanques 0.833 (contaminated
image); "central prediction held" (artifact of contamination). Falsified: persistence
as a centricity measure; bridge_cost as a Ma/void measure. Both kept in the trail, not deleted.