Saltimbanques — Lab Book

A deterministic instrument for reading multi-object / multi-mass composition. Russell Parrish · Parallax Metrology · ORCID 0009-0008-9781-7995

Stage 3 (Art) of the Parallax research ladder. Last updated 2026-06-25. STATUS: core measure falsified on clean controls — diagnostic screen in progress

One line. AI struggles to render multi-object/mass work because it is challenging to read it. This instrument is the reader. The target it was built for — Picasso's Family of Saltimbanques — "never lets you rest": a field of competing centers that never resolves. The task is to measure that property (Arnheim's eccentric pole) and separate it from a settled distribution (Mondrian) and a single anchor (a portrait).
The substrate (the governing idea). Represent every region by the INDEPENDENT EVIDENCE required to justify its existence. The purpose is not to recognise structure — it is to justify structure. Three principles, and a hard gate: Acceptance test (in force): any new quantity must answer a structural question that Evidence, Persistence, or Dependency cannot already answer — else it is not added. The canonical representation is the belief vector (kept un-collapsed); support maps are visualisation, not the representation. Language is binding: measurements, not interpretation ("EDI measures evidence-composition diversity" — NOT "instability/restlessness", which are untested hypotheses). Full audit: AUDIT.md.

1. What the instrument is

A fixed, training-free measurement frame applied to any image; interpretation is per-domain (the program's frame/memory/deviation discipline). Built by forking the proven parallax-portrait engine (the reconciled consolidation of the VTL color v0.4.8 spine + pathology radial/Ω + Canon3 + the-crit DGI + multitracker + gauntlet) and adding the layers below.

LayerModuleReadsStatus
Full kernel (the G's): L1/L3/L4 VCLI-G g1–g4/L6/L8 GLCM/L9 blobs/spectral/edgefull_kernel.py (50 coords)generalist substrate, any imageworks
L6 multi-island resistance graph (re-added; portrait dropped it)resistance.pyislands, edge corridors, drain, bridge_cost (void)bridge_cost confounded
Heartbeat: persistence under resolution (the squint)persistence.pydoes plurality survive coarsening → centric/eccentricfalsified (= chiaroscuro)
Convergence (do masses point at a shared fulcrum)convergence.pyvector fulcrum, dispersionexploratory (facing-proxy weak)
Scanpath (saliency + IOR — "lets you rest")scanpath.pyfixation dispersion, competitive focitexture-sensitive
Generalist + autonomous readingread_full.py/read.pyone call → verdict + coordsruns
Visualisation (2-row artifact)viz.pysee what it seesworks

2. The trail (backwards) — what we did and learned

2026-06-25 · build
Forked the portrait engine; re-added the canonical L6 resistance graph (the edge-corridor / void machinery the portrait engine dropped). Wrote the heartbeat (persistence = fraction of squint-scales at which the field still has ≥2 mass poles; centricity = 1−persistence = Arnheim axis).
synthetic ground truth
HELD: centric blob (persistence 0.33) vs eccentric 5-mass ring (0.75). But the test killed three things before any real run: gradient pole-finding fragments a textured mass into ~12 poles (→ switched the mass field to Notan: low-pass of luminance, texture-robust); convergence and restlessness failed to discriminate (texture-noise) → removed from the verdict gate.
verification & hardening
Fixed: grayscale/RGBA crash; silent uint8 mis-scaling (wrong color, no crash); added low_resolution flag; integrated the full 50-coord kernel (the G's) so it is a true generalist; fused a chroma channel into the mass field (Boafo: reads equiluminant colour masses luminance is blind to — verified 0→7 poles on a flat-luminance chromatic test).
calibration run #1 (CONTAMINATED)
Pulled the trio + Saltimbanques from Wikimedia. Reported Saltimbanques persistence 0.833 (highest) and "the sharp Mondrian test passed." RETRACTED. The Saltimbanques image was a gallery photograph (a viewer's back, frame, wall, adjacent works) — not the painting. Caught by looking at the file. Integrity rule since: read every image before trusting a number.
clean re-run + image upgrade
Replaced with clean flat reproductions (curated, color-accurate, cropped — Caravaggio re-pulled at 46 MP true-color; Mondrian cropped to canvas). Clean Saltimbanques persistence fell to 0.667 — ties Caravaggio, +0.08 over Mondrian. The contamination had inflated it. The single mass-persistence axis no longer isolates Saltimbanques.
the Rembrandt control — FALSIFICATION
Added the simplest possible centric control: a single-face self-portrait on a plain dark ground. Prediction: hard CENTRIC, persistence ≈ 0. RESULT: persistence 0.75, ECCENTRIC — higher than Saltimbanques. A single face beats a six-figure scene. Diagnosis (confirmed by the viz): the mass field |blur(L)−mean| counts both light and dark deviations, so a chiaroscuro figure = many tonal masses (light face + dark hat + dark coat). Persistence measures chiaroscuro / tonal busyness, not composition. A second measure fell with it: bridge_cost is highest on Mondrian (a hard grid maximises inter-island corridor resistance) → it tracks hard partitioning, not Ma.
flow-field probe — the directional device negative
Russell's reframe: the brilliance of Saltimbanques is the direction of the figures and their gaze — vectors that fight and never seat (vs Caravaggio/Matisse, which have a resting star). Built a structure-tensor flow field + least-squares "resolving star" (flow.py). Result: it reads brushwork/form GRAIN (vertical-dominant), not gaze/gesture. resolves is flat 0.66–0.70 (near-parallel grain lines → degenerate star fit); fight_index is art-backwards (Caravaggio highest). Confirmed: the directional device is a SEMANTIC vector — a figure looking left makes no leftward intensity signal; gaze/gesture live in what the forms depict, not in edge orientation. Reading them genuinely needs face/pose/gaze detection = crosses the no-semantic contract. Silver lining: the flow field accidentally drew the figures as a row of vertical strokes at intervals — that is the rhythm skeleton, and unlike gaze it IS reachable (spacing/periodicity of masses). "How it holds" via direction = semantic, out of reach; via rhythm = structural, in reach. Artifacts: outputs/flow_*.png.
Flow field on Rembrandt — the structure tensor reads near-parallel vertical grain (brushwork / form), not gaze; the "resolving star" gets flung above the head because parallel lines have no real intersection. Grain, not the directional device.
rhythm / echo probe — controlled dashes informative negative
Russell's reframe: rhythm is echo following contrast — big shapes first, then dashes tracing their contrast to find the smaller forms echoing the larger (arms echoing the torso). Built a CONTROLLED dasher (rhythm.py): coarse structure tensor (big shapes only) + a contrast gate. The control works mechanically but lands on the wrong things — on Rembrandt, Caravaggio AND Picasso the dashes sat on the frame / vignette / background gradient, not the figures, because raw coarse contrast is strongest at the periphery, not the subject. Of the numbers, only rhyme (orientation concentration) discriminated (Rembrandt 0.72 vertical vs Caravaggio 0.37 scattered); echo saturated, beat confounded by mass width. Artifacts: outputs/rhythm_*.png.
Controlled dashes on Caravaggio — even gated to "major contrast," they land on the window / wall / frame, not the dark figure group (which is low-contrast, dark-on-dark). Contrast-gating ≠ the perceptual big shapes.
THE UNIFYING CONCLUSION — big shapes first
Every probe to date — gradient poles, tonal mass, persistence, the flow field, contrast-dashes — latches onto texture / grain / frame / vignette instead of the figures, because we keep skipping step 1. Russell's own process is the correct algorithm: (1) find the big shapes (figure-ground); (2) dash their contours for echo/rhythm; (3) follow the contrast of those shapes. We kept attempting 2 and 3 without 1. "Big shapes first" is a figure-ground segmentation, not a contrast threshold, and it is the prerequisite for everything — rhythm, echo, vectors, centricity all sit on top of it. This is the channel-union figure-ground segmenter (§4b): big shapes = edge-bounded, void-separated, colour-subdivided regions. confirmed next build: grouping.py
grouping.py v1 (colour/tone-led) fails on chiaroscuro
Built the big-shape segmenter as felzenszwalb on CIELab (colour+tone similarity). Flat/colour works segment well (Picasso 3 figure-shapes/26%; Matisse 7 zones/35%). Chiaroscuro works fail: Rembrandt 96% "ground" + a 3.7% shard (only the lit face); Caravaggio 98% ground. A dark figure on a dark ground IS the same tone — region-similarity cannot separate them. The exact wall again. Artifacts: outputs/grouping_*.png.
grouping v1 (colour/tone-led) on Rembrandt — only the lit face survives as a figure (3.7%); the hat, body and ground all merged into one 96% "ground". The dark figure dissolves into the dark ground.
the fix (was already in hand from §4b)
The channel diagnostic already showed the low-Canny silhouette holds the chiaroscuro figure whole (outline captured even where interior is dark). So the segmenter must be EDGE-LED, not colour-led: (1) silhouette = low-Canny + confirmed edges → morphological close + fill → figure region; (2) ground = border-connected remainder; (3) colour subdivides the figure regions (Picasso clump → sub-figures). Edge defines, colour subdivides — v1 inverted the order. next: grouping.py v2 (edge-led)
grouping.py v2 (edge-led) — STEP 1 STANDS breakthrough
Rebuilt edge-led: low-Canny + chroma edges → dilate → morphological close → fill holes → open. The chiaroscuro wall is broken. Rembrandt: the whole bust as one figure (33%, was 3.7% in v1); Caravaggio: the figure assembly (27%, was 2%). Picasso: cluster + the seated woman as two groups across the ground-void — the Saltimbanques structure, on the actual figures. For the centric/eccentric question the group-level read already works: Rembrandt 1 group (centric), Picasso 2 groups + void (eccentric) — the centricity signal (n figure-groups + void-mediation) finally computed on figures, not texture/grain/frame. Limits: over-merges WITHIN a group (Picasso clump = 1 blob; Matisse zones fused) → needs the colour-subdivision step (v1's strength); Rembrandt's dark hat partly lost (low-contrast silhouette, tunable). Artifacts: outputs/grouping2_*.png.
v2 (edge-led) — the whole bust returns as one figure-shape (33%, was 3.7%). The chiaroscuro wall is broken.
v2 — the left cluster and the seated woman as two groups across the ground-void: the Saltimbanques structure, on the actual figures.
Next: v3 = edge-led groups (v2) + colour subdivide (v1) — the channel union — then derive the centricity coordinate and validate on the controls.
centricity coordinate v3 — FIRST VALIDATED STRUCTURAL COORDINATE milestone
compute_centricity on edge-led groups + colour subdivision, components reported (n_groups, anchor_dominance, dispersion, void) — not pre-fused. Passes the synthetic gate: centric(1) → 1 group / dispersion 0.0; scattered(5) → 6 / 0.45 (and subdivision does NOT over-split the monochrome blob). Orders the paintings sensibly by dispersion: Rembrandt 0.19 (centric) · Caravaggio 0.24 · Picasso 0.34 (eccentric) · Mondrian 0.37 · Matisse 0.49 (distributed); anchor_dominance inversely (Rembrandt 0.75 → Matisse 0.14). The first coordinate to pass ground truth AND order the paintings sensibly — after persistence/bridge/mass_conc/anchor/convergence/flow/rhythm all confounded. Residual confound (honest): subdivision is colour-based, so it couples with colourfulness (disegno↔colorito) — Matisse reads distributed partly because it is most colourful; Caravaggio under-splits (tonally similar figures → reads too compact). dispersion (spatial spread) is more robust than n_groups (count). Fix: subdivide by colour OR internal edge (split tonally-similar-but-edge- separated figures), + a confound control (monochrome-distributed vs colourful-centric). Code: grouping.py compute_centricity(subdivide=True).
centricity hardened → CONFOUND-CONTROLLED first validated coordinate
Resolved the colourfulness confound by moving the validated coordinate to the GROUP level (edge-led figure-groups, no colour subdivision) and adding a clean void_between_groups (ground fraction in the corridor between the two largest groups). Confound-control battery PASSES all four: a colourful centric figure stays centric (colour can't fake eccentricity); a monochrome eccentric scene reads as 5 groups (no colour needed); low- and high-contrast centric both stay centric (contrast-invariant); centric vs eccentric separate. eccentric_mono ≈ eccentric_colour (5 / disp 0.47) — colour changes nothing. The project's first coordinate to pass ground truth AND isolate its factor from colour and contrast.
On the paintings it lands exactly on the Saltimbanques mechanism: Picasso is the ONLY work with void-separated groups — void_between_groups = 0.21 for Saltimbanques, 0.0 for Rembrandt / Caravaggio / Mondrian / Matisse. The LOAD-BEARING VOID between the seated woman and the cluster — what makes her separable, what Rilke circled — is now a validated, confound-free number. Scope (honest): reads the strongest eccentricity (groups across a void); adjacent-distributed fields (Matisse) read as one group — a true structural fact (they fill the field; Saltimbanques opens a void), not a confound. Code: compute_centricity (group-level default) + _void_between_groups.
rhythm #2 (figure mass-peaks) fails the gate — the boundary
Re-aimed rhythm onto the figures: detect sub-elements as mass-peaks INSIDE the figure mask, measure spacing-regularity + syncopation. Fails the synthetic gate: a regular row of 5 blobs reads as 9 elements with regularity 0.089 — below random's 0.171; coarse smoothing did not fix it. The painting numbers (Picasso syncopation 3.08, which looks like the seated-woman break) are therefore not trustworthy — the gate caught it. Visual diagnosis: the |blur(L) − mean| mass field peaks at figure rims, not centres (5 disks → 9 rim points), and on Saltimbanques the 17 "elements" are scattered mass-deviation hotspots, not the 6 figures.
Synthetic regular row — should find 5; finds 9. The mass field is bright at each disk's rim and dark at its centre, so peaks land on rims (top+bottom), and the figure mask fuses the row. Over-fragmentation, visible.
Saltimbanques — 17 detected "elements" land on lit patches and edges, not on the six figures. No clean per-figure elements to give a rhythm.
The boundary, named: GROUP-level structure (void-separated groups, centricity, the load-bearing void) is reachable and validated. INDIVIDUAL-figure structure (rhythm, directional gaze) needs occlusion-aware per-figure separation or semantics — outside the deterministic, training-free contract. Same wall the directional gaze hit. Operating envelope for multi-object reading: the void/group device — YES; figure-level rhythm and gaze — NO (with contract-legal tools). This answers the original skepticism honestly: one validated mechanism, two named limits.
THE PIVOT — ConsensusMask v1: consensus existence breakthrough
Russell's redirection dissolved the wall instead of climbing it: every failure in this whole trail was a single construction trusted alone (gradient poles, tonal mass, the flow field, one edge-closed mask) — each fell for a different artifact (frame, vignette, texture, rims). The third way between "recover entities" and "never infer entities": recover stable attractors — regions that earn existence by SURVIVING MULTIPLE INDEPENDENT DETERMINISTIC CONSTRUCTIONS. A region is real if the image keeps reasserting it under different readings. Deterministic objectness, no semantics.
ConsensusMask v1 (consensus.py): edge-closed · tonal-Notan · chroma-region, each at 5 blur scales. SUPPORT(pixel) = fraction of the constructions asserting it; consensus regions = connected components of high support; CONFIDENCE = how many constructions agree. It works on the first build: figures emerge as high-confidence regions and the ground/vignette/texture that fooled every single mask fall to low confidence — without naming an object.
Saltimbanques — the support field lights the cluster + the seated woman and suppresses the sky/ground; consensus regions land on the figures (cluster 0.52, woman separate). The thing the image repeatedly insists on IS the figures.
Rembrandt (chiaroscuro) — the lit face earns the highest confidence (0.55–0.72): edge gets the silhouette, tone gets the face, chroma gets the flesh, and where they agree is the focal point. The wall that broke every single mask, handled gracefully by agreement.
Caravaggio: lit figures 0.64–0.70, dark wall 0.33. The question changed from "can I segment?" to "what does the image repeatedly insist is there?" — and v1 answers it. Confidence is GRADED, not binary — the strength: it measures degree of insistence. Honest v1 scope: channels × 5 scales only (perturbation- survival not yet in — the next strengthening); threshold/merging first-pass; touching figures still merge into one consensus region (the cluster). This is the SUBSTRATE; the region graph + composition readout sit on top.
ConsensusMask v2 — perturbation-survival added
Step 6, "the most important addition": re-run the constructions after geometry-preserving degradations (grayscale · gamma↑/↓ · JPEG q40 · contrast-normalise); a region that survives perturbation earns more existence, one that needs a single condition (a colour region that vanishes in grayscale, texture that fragments under JPEG) loses support. Effect: modest but correct. Saltimbanques cleaned from 3 consensus regions to 2 — the cluster + the seated woman (a fragile third dropped); confidences uniformly nudged down (degradation). The thin-edge frame still persists at low confidence (0.33), so artifact rejection comes from CONFIDENCE-FILTERING (keep regions above ~0.4), not perturbation alone — most of the signal was already in channels×scales. Code: consensus_mask(perturb=True).
Saltimbanques, perturbation-strengthened — the support field lights the cluster + the seated woman, ground suppressed; consensus settles on the two robust structures.
composition readout v1 inverts the key cases — corrected direction
Built the region graph + readout (composition.py) on confidence-filtered consensus regions. It inverted the two cases that matter: Saltimbanques → SINGLE-CENTRE (the seated woman scored 0.38, below the 0.42 threshold, so she was dropped — collapsing to one region); Rembrandt → MULTI-CENTRE (its sub-parts cleared the bar and the dark gaps between them fired the void). Root cause: (1) an uncalibrated confidence threshold drops real regions and keeps fragments; (2) the void-corridor fires between ANY two regions, including intra-figure gaps (the bridge_cost confound reborn); (3) I conflated raw consensus regions with figure-groups; (4) I skipped the synthetic gate. A fixed threshold cannot separate "a real separate figure" (the woman, 0.38) from "a fragment of one figure" (Rembrandt's part) — they have similar confidence; the difference is connectivity (the woman is a void-separated component; Rembrandt's face+body is one connected figure), which a threshold discards. Correction (not yet built): the consensus SUBSTRATE is right (support field lights figures); the composition LAYER must sit on edge-led FIGURE-GROUP connectivity (validated: Rembrandt 1, Saltimbanques 2) weighted by consensus confidence — not raw thresholded consensus regions — and pass a synthetic gate first. Artifacts: outputs/composition_*.png (the broken v1, kept as provenance).
AblationImpact / indispensability — composition is CAUSAL keystone, validated
Russell's keystone: a region's importance is not its confidence, it is its NECESSITY. Remove it, recompute the composition, measure what breaks (ablation.py, on the validated edge-led figure-groups — connectivity, not a threshold). Outputs COMPONENTS, no verdict (the third repeat of the don't-classify-early lesson). Synthetic gate PASSES and shows necessity ≠ redundancy: single mass → indispensability 0 (nothing depends on it); cluster+lone → 0.63 (each region load-bearing, removing either kills the void); scattered-5 → 0.14 (remove one, four remain — redundant). Saltimbanques — the fix and the finding:
regionareavoid_dependencyindispensability
cluster0.5860.210.2365
seated woman0.0100.210.2365
The woman is 1% of the canvas by area and EQUAL to the entire cluster in indispensability — remove her and the void collapses exactly as much as removing the whole cluster. The region the confidence threshold DROPPED (0.38) is the most structurally necessary thing per unit of herself. Necessity is not mass; composition is causal, not spatial — which is also why bridge_cost failed (it measured the corridor, not the dependency). Graph-theoretic equivalent: articulation points / betweenness centrality = indispensability. Caveat: the other four works read n=1 (edge-led grouping merges touching figures), so ablation fires only for VOID-SEPARATED multi-group compositions (precisely the Saltimbanques case); separating touching figures into ablatable sub-units is the open edge.
Evidence-family weighting — belief, not democracy substrate
Russell: ConsensusMask averaged every construction, but edge/tone/colour/scale/perturbation are correlated — 5 blur scales of one operator are not 5 witnesses. consensus_families: ONE vote per evidence FAMILY (edge · tone · colour · perturbation-survival); WITHIN the edge family, genuinely different operators (Canny · DoG · morphological gradient) so agreement is real. Each region now carries a BELIEF VECTOR — the disagreement kept, not collapsed — and n_families is honest evidence (no overcounting).
Saltimbanques — each evidence family asserts DIFFERENT regions. The colour family carries the seated woman where edge and tone stay dark; the cluster is multi-family; a tonal band shows in tone only. The kept disagreement, made visible.
The finding that explains the whole history: the seated woman's belief vector is edge 0.10 · tone 0.50 · COLOUR 0.96 · perturbation 0.43she is a COLOUR-built figure. That is exactly why every luminance/edge reading lost her (gradient poles, tonal mass, the confidence threshold at 0.38). Weight the families honestly and she is there. Contrast: the cluster is multi-family (n_fam 4, robust); a tonal band is tone-only (n_fam 2); Matisse's gray field is colour 0.0 (neutral). The per-region belief vector IS the disegno↔colorito fingerprint, local and un-collapsed. Caveats: "colour" = chroma presence (warm palettes fire it — Rembrandt flesh 0.78), so read the relative profile; ~70s/image (optimise for corpus). Code: consensus_families, render_families.
Assumption depth — the epistemic map crown (substrate)
Russell's deepest direction: structures are not equally "real". ASSUMPTION DEPTH = how many independent evidence families must accumulate before a structure stabilises — the correct rebirth of the failed "persistence" (not persistence under blur, but under EVIDENCE REQUIREMENT: demand ≥1 family, then ≥2, ≥3, ≥4). assumption.py: per-pixel evidence count = the EPISTEMIC MAP; a depth ladder; per-region depth. Works: Saltimbanques depth ladder 0.75 → 0.62 → 0.30 → 0.05 (demanding all four families leaves only a 5% deeply-real core); the epistemic map shows deeply-corroborated cores (the figures) vs shallow surrounds (the ground). Honest miss: the per-region depth came out flat (~3) because regions are found at a support threshold FIRST (pre-selecting multi-family pixels) then measured — the refinement is to find regions at EACH depth level independently (then the colour-built woman appears at depth 1 and dissolves by 3).
Saltimbanques epistemic map — every pixel coloured by how many independent evidence families agree (purple = shallow / one family, yellow = deep / all four). The figures are deeply corroborated; the ground is shallow. No single mask can draw this.
Evidence Diversity Index — how many construction logics the general result
Russell's leap: the belief vector is a COORDINATE, not a property — a region has a CONSTRUCTION SIGNATURE (its spectrum, like spectroscopy), not an origin. Compare regions by ANGLE (how they are built). The image question stops being "what is the object" and becomes "how many different construction logics does this image hold at once." EDI = area-weighted angular spread of the construction signatures (diversity.py). Hypothesis HOLDS, and sharper than expected:
workEDIreading
Caravaggio0.014multi-figure but ONE logic (tenebrist value everywhere) → coherent, stable
Rembrandt0.118one dominant value logic, everything agrees → strong single anchor
Matisse0.119tone-led, moderate
Picasso / Saltimbanques0.217cluster (balanced) · tonal band (tone-only) · woman (colour 0.96) — THREE logics at once → diverse, unstable
The finding: EDI separates "multiple objects, ONE logic" (Caravaggio — coherent) from "multiple objects, MULTIPLE logics" (Picasso — diverse). The instability is the diversity of construction logic, not the object count — exactly the hypothesis, and the thing no compositional measure could reach. This is the Parallax variance-of-disagreement thesis at the evidence level, and it is IMAGE-GENERAL — it would run identically on an MRI or an AI render; it asks nothing about portraiture or composition. Second axis — FRAGILITY (belief concentration): Caravaggio redundant (0.01, survives losing any one family) vs Mondrian plane / Picasso tonal band brittle (0.27–0.28, depend on one family). Coherent-and-redundant vs diverse-and-brittle = two ways to be real. Caveats: tone fires broadly so EDI rides the edge/colour variation; 1–3 regions per work (needs finer segmentation to be robust at scale); family-basis breadth is what keeps it honest (Russell's caution).
Construction signatures — same four evidence families, but Picasso's regions need very different mixtures (diverse evidence composition) while Caravaggio's are near-identical (one composition). The bars ARE the evidence diversity. (Language: composition, not "logic" — logic is inferred.)
Disentangled measurement vector + the regime plane measurement
Separated the conflated quantities into distinct measurements (measures.py): amount (how much evidence) · agreement (do families concur) · certainty (how decisive) · fragility · plus local evidence contrast (adjacent-region signature difference) and deep_core. The disentangling earns itself: agreement and certainty are genuinely separable — Mondrian lowest agreement (0.67, families spread) but highest certainty (0.56, decisive); Matisse the opposite (0.83 / 0.28). The depth × EDI regime plane spreads the works — Caravaggio (high deep_core, low EDI) opposite Picasso (low deep_core, high EDI) — suggesting depth and EDI are anti-related, not independent (n=5, structural, NOT a statistical claim). Artifact: outputs/regime_plane.png.
ARCHITECTURAL AUDIT — Evidence · Persistence · Dependency consolidation
Russell's directive: pressure-test the architecture, not add detectors; determine which quantities are independent vs redundant projections. Independence audit (analytical overlap + n=5 sanity check) → AUDIT.md. Finding: amount · agreement · fragility · deep_core form ONE axis (mutual |r| 0.81–0.96 at n=5; mathematically all functions of belief magnitude/spread) — call it corroboration. Collapses: agreement+fragility → concordance; deep_core + assumption-depth → Evidence (depth = #families-agreeing IS concordance, not the distinct persistence); void_dependency → a component of indispensability; group_count / anchor_dominance / dispersion → ART-LAYER (not general primitives). The irreducible core (13 → ~5): EVIDENCE (the belief vector → amount/concordance/certainty) · PERSISTENCE (perturbation-survival — the distinct kind; depth is Evidence) · DEPENDENCY (indispensability change-vector). Key structural finding: Dependency is ORTHOGONAL to Evidence — the Saltimbanques woman has low Evidence (1% area, single-family colour) yet high Dependency (load-bearing). EDI = a 2nd-order Evidence statistic (variance across regions), empirically the most independent. Generalisation audit: all retained quantities are art-free (identical meaning for pathology / SEM / satellite / AI). Next: the corpus run, on the REDUCED set, to test independence (depth vs EDI) and signature clustering by domain. See AUDIT.md for the full dependency matrix.
CORPUS TEST (n=25) — the audit re-grounded; "13→5" RETRACTED validation
Russell's correction: don't cut for minimalism — find GENUINE duplicates, otherwise RETAIN ("more survive, for the wider picture"). Ran all 13 measures over 25 mixed works (East-Asian ink-on-void · Western oils · abstraction; corpus_run.py). The n=5 "one corroboration axis" did NOT survive real n — amount and agreement are nearly orthogonal (r=+0.06): how MUCH evidence vs how CONCORDANT are two different things. Only TWO genuine duplicates: agreement↔fragility (r=−0.92, definitional — both are belief-vector spread) → merge to concordance; void_load_bearing⊂max_indispensability (r=+0.99, containment) → nest, don't delete. The spine HOLDS: Evidence×Dependency cross-correlation max |r| = 0.44 — Dependency is a genuinely separate axis, not a projection of Evidence (the load-bearing woman). Retracted: group/anchor/dispersion are NOT "art-layer" — they're general primitives, just dormant here (88% zero: only 3/25 works have ≥2 groups). Disposition: 11 distinct + 1 merge + 1 nest, several dormant-but-general KEPT for breadth. Caveat: Dependency is 88% dormant here, so orthogonality is directionally confirmed but under-exercised — needs a multi-mass corpus.
n=25 correlation (orange lines split Evidence ⟂ Dependency blocks): the cross-block is uniformly pale (max |r| 0.44 — the spine holds); each block is internally structured; the Dependency cluster is collinear because the corpus is single-mass-dominated (corpus-contingent, not definitional). Right: dormancy — local_contrast + the void cluster read 0 on ~88% of these works (kept for the wider picture, not cut).
The three headline claims as scatters. LEFT amount vs agreement (r=+0.06) — a cloud: how-much and how-concordant are distinct, both kept (the n=5 read had wrongly fused them). MIDDLE agreement vs fragility (r=−0.92) — a tight anti-diagonal: one quantity by construction → merge to concordance. RIGHT void_dependency vs indispensability (r=+0.99) — the active points lie on a line (containment → nest), and the cluster at the origin is the 22 single-mass works where Dependency is dormant.
Per-work measurements (n=25), sorted by group count then EDI. Highlighted rows = multi-group (Dependency-active); 22/25 are single-group, which is why void between / max indispensability are mostly 0 here — dormant on this corpus, not failing. Labelled works carry era/culture in the filename (Sotoguchi 20c, Sin Wi 19c, Itchō c.1710).
workamountagreementcertaintyfragilityEDIdeep coren groupsvoid betweenmax indispensability
J. 20th century mid Sotoguchi 0.5330.7290.4460.1270.3190.0283.0000.7380.542
P. 19th century early Sin Wi0.5320.7020.5140.1550.2330.0102.0000.7000.707
C. 1710 Hanabusa Itcho0.5690.8870.1950.0140.0000.0902.0000.6250.855
DT10270.4760.7430.4170.1240.4430.0431.0000.0000.000
DT2340580.4590.8840.2250.0320.3500.0241.0000.0000.000
DT18830.4860.7520.4330.1380.3450.0771.0000.0000.000
Jilted0.4730.8650.2530.1210.2760.0641.0000.0000.000
m1100.5580.8900.2130.0260.2140.2051.0000.0000.000
Helen by a Chair0.4510.8000.3480.1000.2100.0141.0000.0000.000
m1090.4420.9150.1930.0510.2100.0291.0000.0000.000
DT25660.5310.8300.3820.0860.2040.0921.0000.0000.000
DP-12952-0010.6370.9010.3670.0490.1870.1601.0000.0000.000
DT15000.4360.8340.3350.0920.1700.0201.0000.0000.000
m1030.5660.7290.4730.1340.1590.0541.0000.0000.000
DP-14936-0330.5420.8590.2560.0490.1460.0981.0000.0000.000
F. 18th century unkown20.4960.8920.1960.0270.1400.1121.0000.0000.000
DT15650.5310.9220.1490.0320.1240.1351.0000.0000.000
DT18790.4850.9130.1660.0230.0820.0371.0000.0000.000
DP-20101-0010.4850.7950.3440.0870.0670.0611.0000.0000.000
DP1709050.7250.8550.5390.0340.0660.2101.0000.0000.000
DT84750.4980.9410.1010.0080.0590.0851.0000.0000.000
DP2320290.4760.9260.1390.0090.0190.0781.0000.0000.000
DP-17679-0010.6830.9260.3660.0040.0000.2141.0000.0000.000
DP-20099-0010.4610.9320.1350.0080.0000.0401.0000.0000.000
DT19430.5260.9150.1480.0090.0000.1171.0000.0000.000
CANON-3 SDI probe — a continuous field companion (and a self-correction) cross-validation
Russell surfaced the VTL_Kernel_Metrics … Canon_3_SDI notebook (a separate r_v package) and asked if it helps the dormancy gap. It does — and it caught a wrong claim I had committed. Our void/dispersion are discrete (need ≥2 figure-groups); Canon-3's r_v (void ratio = fraction of frame below a gradient threshold) and SDI (mass spread from centroid) are continuous field measures, no grouping. Probe on the same n=25 (canon3_probe.py): r_v/SDI are 0% dormant vs our 88% — active on every work. And where ours is active (the 3 multi-group works) r_v tracks void_between (Sotoguchi highest on both; r=+0.40), so the field version is cross-validated to trust where the discrete one is dormant. r_v is also independent of the Evidence axes (vs EDI≈0). Self-correction: my earlier "honest miss" — that the ink-on-void works did NOT activate the void measures — was WRONG. The three multi-group works ARE the ink works (Itchō/Sotoguchi/Sin Wi), with the highest void_between (0.625–0.738); I'd mis-read the corpus mean n_groups (1.16) as applying to them. The real dormancy is on the dense single-figure works. Caveat: r_v is a single Sobel-gradient construction — the texture-amplifying path we left for Notan; it will under-read void on a densely brushed oil. Recommended integration: port the concept (continuous field void + dispersion) onto our texture-robust Notan mass field, not raw gradient — always-active AND squint-robust. Drop ρ_r (correlates with amount, r=0.60).
multi-group workn groupsours: void_betweenfield: r_vfield: SDI
Sotoguchi (20c ink)30.7380.8840.194
Sin Wi (19c ink)20.7000.7750.260
Itchō (c.1710 ink)20.6250.8470.224
r_v/SDI active on all 25 (0% dormant); these 3 are where the discrete void is testable — and the field reading agrees.
BUILT: texture-robust field-void / field-dispersion (Notan substrate) build + gate
Per the recommendation: ported the Canon-3 concept (continuous, grouping-free void + dispersion) onto our texture-robust Notan mass field — NOT the raw gradient (field.py). Mass = multi-scale |squint(L)−median| + chroma; median baseline (not mean) so a uniform ground reads as void and figures as deviation (the mean fix: a figure-pulled mean had turned empty ground into spurious mass, failing the gate — corrected). Synthetic gate first (field_gate.py, predictions locked): field_void flat≈1 > centered > full≈0; field_dispersion scattered > centered; and the decisive texture test — smooth vs textured blob: Notan field_void Δ=0.013 vs gradient r_v Δ=0.087 (gradient 6.7× more fooled). ALL GATES PASS. On the n=25: field_void/field_dispersion are 0% dormant (vs 88% discrete), cross-validate with discrete void at r=+0.74 (beats the raw-gradient probe's +0.40), and are independent of Evidence (vs EDI −0.17). The three ink works top the field_void ranking. Wired into measure_all; discrete void_between KEPT (load-bearing between-group reading where ≥2 groups exist), field measures carry void/dispersion continuously everywhere else.
LEFT the dormancy fix — field_void (Notan) is active across all 25 works (orange = the 3 multi-group / ink works, teal = the rest); discrete void_between (purple) fires only on 3. MIDDLE the synthetic gate's texture test — Notan field_void barely moves smooth→textured (Δ0.013) while gradient r_v collapses (Δ0.087): this is why we build on Notan, not gradient. RIGHT the Notan mass field on Itchō (c.1710 ink, highest field_void 0.627) — bright = mass (tiger, waterfall, tree), dark = the load-bearing void the discrete reading only saw because it happened to split into 2 groups.
PERSISTENCE surfaced (the third leg) + first non-art generality test build + gate + cross-domain
Of the three principles, Evidence and Dependency were exposed; Persistence was buried inside consensus as one family vote. Surfaced (survival.py, in measure_all): degrade the image (grayscale / gamma± / JPEG / contrast-norm), re-read the Notan mass structure, measure how much survives (magnitude-aware energy overlap). Outputs persistence, persistence_min, and fragile_to — which degradation hurts most = what the structure is built from. Distinct from Evidence: a colour-built region is well-supported yet grayscale-fragile (the woman: edge 0.10, colour 0.96). Synthetic gate (survival_gate.py, locked): equiluminant colour blob survival 0.00 under grayscale, fragile_to=grayscale; tonal blob survives (1.00); noise bounded. All pass. (First pass used a magnitude-BLIND correlation that missed the collapse — it locked onto the disc-shaped numerical residual and reported ≈1; corrected to energy-overlap. Two gate bugs caught and fixed: mean→median baseline, correlation→energy-overlap.) First generality test (cross_domain.py): ran the full instrument on H&E pathology tiles (NCT-CRC adipose/stroma/tumour, 224×224) + art, no domain knowledge — 12/12 clean, no crashes. Locked prediction HELD: H&E is stain-defined (chromatic), so it should read colour-built → 8/9 tiles fragile_to=grayscale, the SAME signature as the woman and Mondrian's colour blocks. The instrument reads what a structure is built from identically in a Picasso and a tumour section. First empirical support for the domain-general thesis.
Persistence signature — survival of structure under each degradation (cyan box = fragile_to). Pathology (above the orange line) is uniformly fragile to GRAYSCALE (the grayscale column is the weakest cell) = colour/stain-built; art (below) is varied — Picasso fragile to gamma_down, Caravaggio to gamma_up, Mondrian to grayscale (its colour blocks). The instrument names what each structure depends on with NO domain knowledge — the generality thesis, first evidence.
Does it DISCRIMINATE tissue classes? (honest screen, not just "it runs") discrimination
Russell's question: the 9-tile run tested generality, not class prediction — does it actually separate classes? Proper screen (disc_screen.py): 9 NCT-CRC classes × 25 tiles, cheap measures only (field-void/ dispersion + persistence + the 5 per-degradation survivals = 9 deterministic features, NO consensus), 5-fold CV. LDA 66.2%, kNN 67.1% — vs 11.1% chance (~6×). The measures carry real tissue signal. Honest reading: (1) it's a screen, not a benchmark — published CNNs on the full 100k-tile set hit ~94–99%; 66% from 9 hand-built deterministic features is far above chance but not competitive, nor meant to be. (2) field_void carries most of it (ANOVA F=57): it reads tissue CELLULARITY/density — BACK (empty) 0.95 void, NORM 0.145, LYM 0.232 (dense nuclei). (3) Persistence helps (surv_gamma_down F=40, surv_grayscale F=27). (4) field_dispersion is useless here (F=4.5, all ~0.28): tiles are textures, not compositions — the composition-geometry measures correctly find nothing to read. So it's the DENSITY + PERSISTENCE measures that transfer, not the compositional ones — exactly what you'd expect, and an honest negative for the composition side. STR (stroma) is hardest (36% recall, confused with muscle/tumour — structurally similar fibrous tissue); BACK 96%, NORM 88%, LYM 76%.
LEFT LDA projection of the 9 deterministic measures — BACK (empty) separates cleanly; the cellular tissue classes cluster with overlap. RIGHT 5-fold-CV confusion (row-normalised): strong on BACK/NORM/LYM, weak on STR (smears into MUS/TUM). 66% vs 11% chance from measures never designed as a tissue classifier — the structural measures are tissue-informative, density and degradation-survival doing the work.
Does EDI / the Evidence measures help? (A/B vs the cheap set) ablation
Russell's pathology book hits 91.5% nine-class (CRC-VAL-HE-7K, zero trained params) — but it has H&E colour-deconvolution + multi-scale GLCM texture + nuclear-blob layers Saltimbanques doesn't carry. He asked: add EDI, see if it helps. Ran full measure_all (consensus EDI + all Evidence) on 180 tiles (20×9), 5-fold CV (disc_screen2.py):
feature set9-class accvs chance 11.1%
cheap (field + persistence, 9f)64.4%5.8×
cheap + EDI only (10f)66.1%+1.7pp
cheap + ALL Evidence (16f)73.9%+9.5pp
Evidence only (7f)43.3%weaker alone
Honest read: EDI specifically adds almost nothing (+1.7pp; ANOVA F=7.2, rank 13/16 — near the bottom). That is correct: EDI measures evidence-DIVERSITY across regions = a composition property, and a tissue tile is uniform texture with one construction logic — nothing for EDI to read. The composition measures (EDI, field_dispersion, local_contrast) sit at the bottom of the ranking — the instrument does not manufacture signal that isn't there. But the full Evidence set lifts +10pp to 73.9%, driven by amount (F=65, the single strongest feature — evidence magnitude ≈ structural richness/density) and deep_core (F=28, all-family agreement). So the evidence-MAGNITUDE/coherence measures transfer to tissue; the diversity/ composition ones stay (correctly) quiet. Still well short of the purpose-built 91.5% — we lack the H&E / GLCM / nuclear-blob layers — exactly as expected.
LEFT the A/B: EDI alone +1.7pp, the full Evidence set +10pp (to 73.9%), all far above chance but below the purpose-built book's 91.5%. RIGHT feature discriminative power (ANOVA F): amount + field_void (density) + degradation-survival carry the signal; EDI (cyan) ranks 13/16 — the composition measures correctly find little to read in uniform tissue texture.
Waking the core: composition geometry on CONSENSUS regions build + gate
Back to the instrument's actual job. Its signature axis — Dependency / relational composition (void, indispensability) — was dormant on the works it was built for: edge-led grouping merges most paintings into one figure-group, so the axis fired on only 3/25. Fix (no new principle, no semantics): run the SAME indispensability machinery on the consensus regions we already extract — median 4 regions/work (21/25 with ≥2) vs edge-led's median 1. Added composition_on_consensus (shared _graph_components; edge-led reader untouched). Synthetic gate (region_gate.py, locked): single→0.000, cluster+lone→0.583 (lone mass across the void is load-bearing, matches the old gate's 0.63), scattered→0.214. All pass. On the art n=25: Dependency axis active 21/25 vs 3/25 (7×). Sanity-checked the top reading — DT1565 is Degas's Rehearsal of the Ballet Onstage, a textbook multi-figure composition edge-led merged into ONE blob (dormant); consensus reads 5 masses, indispensability 0.731. Real composition, not fragmentation. Honest limit: the acceptance test wanted "wake more works AND agree with edge-led where both fire" — first met decisively, second inconclusive (only 2 works fire on both; they rank oppositely, r=−1 at n=2 = uninterpretable). The two are different-granularity decompositions (coarse perceptual figures vs finer evidence- regions), not required to agree; the synthetic gate + Degas spot-check are the validation. BOTH readers kept.
Max indispensability per work: consensus-region (cyan) fires on 21/25; edge-led (purple) on 3. The Degas ballet — a genuine multi-mass composition — was fully dormant under edge-led and is now read (5 masses, 0.731). The core relational axis, finally active across the corpus it was built for.
THE LABELED TEST — does the reading mean something? Movements separate labeled corpus
The unlabeled n=25 tested independence/dormancy but not meaning. Built a labeled corpus from the Met Open Access API (free, curl-downloadable; AIC's movement labels are cleaner but its image CDN is Cloudflare-gated): 96 public-domain paintings, 3 multi-artist movements — Baroque 35 (Rembrandt/van Dyck/ Hals/Poussin), Impressionism 24 (Degas/Renoir/Manet/Pissarro), Post-Impressionism 29 (Cézanne/van Gogh/Gauguin/ Seurat/Lautrec). Multi-artist per movement → movement decoupled from artist. Result: the measures separate movements, LDA 5-fold CV = 65% vs 33% chance (deterministic, no training). And the WAY they separate is art-historically real: the Persistence fragile_to signature reads the tonal→chromatic transition — Baroque 71% tonal-built (fragile_to=gamma_up, structure in the darks), Post-Impressionism 83% colour- built (fragile_to=grayscale), Impressionism between. That IS the textbook Baroque-tenebrism → Post-Impressionist- structural-colour story, measured not told, no labels. Carried by field_void (F=35), amount (F=30), persistence/ certainty. Honest: the relational COMPOSITION axis just built (comp_indisp/void) does NOT separate movements (F≈1.2) — it tracks multi-mass-ness, which all three have; the signal is what-structure-is-built-from + density, not layout. Confound flagged: field_void may be partly age/varnish (Baroque darker) — but the tonal-vs-colour fragile_to signature is about where structure LIVES, far less age-confoundable, and is the robust finding. First look, n=88, 65% modest — but the reading is not noise; it tracks a real art-historical axis.
LEFT the Persistence signature by movement: a clean tonal→chromatic gradient — Baroque 71% fragile-to-brightening (tonal/chiaroscuro, orange) → Post-Impressionism 83% fragile-to-grayscale (colour-built, cyan), Impressionism intermediate. The instrument reads "Baroque structure is tonal, Post-Impressionist structure is chromatic" from perturbation-survival alone, no domain knowledge. RIGHT LDA of the full measure set — the three movements separate (65% CV vs 33% chance); Baroque (orange) left, Post-Impressionism (cyan) right, Impressionism bridging.
BRIGHTNESS CHECK — movement is a TONAL phenomenon (a fidelity result, not a flaw) reframed
Correction (Russell): the earlier "confound / deflation" framing was wrong — brightness is not a flaw to apologise for. Tested whether brightness drives the movement separation (met_confound.py): the movements are a tonal gradient (mean luminance Baroque 0.16 → Impressionism 0.30 → Post-Imp 0.44; field_void↔luminance r=−0.73), and mean luminance alone classifies movement at 70.5%, at/above the full instrument's 64.8%. The right reading: tonal/value structure is the compositional backbone — value organises nearly every painting before colour or arrangement — and these movements differ tonally by design (Baroque tenebrism dark → Impressionism/Post-Imp high-key). So "movement" here is a tonal distinction, and the instrument's tonal layer (field_void, amount, certainty, the tonal-vs-chromatic fragile_to) correctly reads it. That brightness is near-sufficient is the instrument telling us what kind of structure the label is (tonal), not a deficiency. The relational-composition measures (indispensability/void) add nothing beyond tonal here because there is no non-tonal signal in the label — and they DO add where the label needs it (pathology, +14pp). Not "averaging the pixels beats the instrument"; rather "this label is tonal, the instrument reads tonal correctly, and the rest of the engine rightly stays quiet."
LEFT the movements are a tonal gradient — nearly separable by mean luminance alone (Baroque dark → Post-Imp high-key). RIGHT brightness alone (70.5%) is at/above the full instrument (64.8%): the label is tonal, so the instrument's tonal reading carries it and the relational measures correctly add nothing beyond it. A fidelity result — tonal is the backbone, read correctly — not a flaw.
WHERE it adds value — the fair test (the real conclusion) resolution
The question: does the engine contribute structure beyond the tonal backbone, and where? Tested both tasks the same way — trivial colour/intensity vs structural-alone vs combined (pathology_baseline.py + movement combined): MOVEMENT — tonal 70.5 / structural 64.8 / combined 69.2 → no lift beyond tonal, because the label IS tonal (no non-tonal signal for the relational measures to add). PATHOLOGY — trivial 68.4 / structural 66.2 / combined 82.2 → +14pp, structure COMPLEMENTARY (the label needs non-tonal structure). Neither wins alone on pathology (66 vs 68); their complementarity is the value. The honest characterisation: the engine reads the tonal backbone correctly everywhere, and contributes additional non-tonal structure exactly where the label requires it. Value-add tracks what kind of structure the label is — not a win or a failure, a precise map of when the relational layer matters.
Trivial baseline vs structural-alone vs combined, both tasks. Pathology: combined (cyan) lifts +14pp over either — the label needs non-tonal structure, so the measures add it. Movement: combined ≈ tonal-alone — the label IS tonal, so the relational measures correctly add nothing beyond the tonal backbone the instrument already reads. Value-add tracks what kind of structure the label requires.
PER-WORK VERIFICATION — does it see what it should? + a region-threshold blind spot audit
Russell's pull-back: stop chasing classification, LOOK at individual works — is the instrument seeing the composition it's meant to? Built a per-work diagnostic (diag_render.py: original | Notan mass field | consensus regions labelled with indispensability) and read 5 known works instead of trusting aggregate stats (the standing rule I'd drifted from). What's SOLID (verified by looking): the Notan mass field tracks the real masses; fragile_to reads correctly per work — Manet Dead Christ tonal/centric (the luminous body IS the mass, fragile_to=gamma_up), Cézanne Card Players + the Gauguin-style still life colour-built multi-mass (fragile_to=grayscale, 6 and 4 masses with high void), Degas ballet multi-mass distributed. The mass/tonal/colour layer measures what it should. (Also: the instrument read the still life correctly as multi-object — it was MY label that was wrong, not the reading.) BLIND SPOTS found by looking: (1) Region segmentation is threshold-sensitive AND non-monotonic — Manet reads 1 region (centric) at the default thr=0.34, but 3 regions (Christ + 2 angels, indisp 0.57) at thr=0.22, then collapses back to 1 at thr=0.16 (everything merges by connectivity). So n_regions / void / indispensability ride on a threshold with an un-validated, non-monotonic sweet spot — the per-work composition COUNT is not robust. (2) Subordinate / low-evidence masses drop below the default threshold (the angels). (3) Mass-scale, not figure-scale: the crowded crucifixion → 3 coarse mass-bands, not 30 figures (by design / the known boundary). Verdict: the mass-field + persistence (fragile_to) + field-void layer is trustworthy and reads per-work correctly; the DISCRETE composition layer (regions/ indispensability) is threshold-fragile and was under-verified — aggregate stats hid it, looking exposed it. Recommendation before trusting per-work indispensability: region-PERSISTENCE across threshold (keep only masses stable over a threshold sweep) — the same persistence principle, applied to segmentation.
Manet, Dead Christ with Angels: the Notan mass field (centre) correctly makes the luminous body the dominant mass; consensus reads it as 1 centric region at the default threshold — defensible, but the angels (darker/chromatic) drop out, and at thr=0.22 they reappear as 2 load-bearing masses. The composition count is threshold-dependent.
Cézanne, Card Players: 6 regions tracing the seated figures and table, high void (0.55) between them, fragile_to=grayscale (colour-built). A multi-figure composition read correctly as multi-mass — the kind of reading the instrument exists for.
FIX: region-persistence — the threshold blind spot, repaired build + gate
Built the fix Russell greenlit (persistence_regions.py, composition_persistent). A mass is a MAXIMUM of the support field; its dynamics (basin depth before merging into a taller maximum) is its persistence — 0-D persistent homology. Select masses with dynamics ≥ h (h-maxima) + watershed to the nearest persistent mass. The knob h (how deep a basin must be to count) replaces the arbitrary threshold, and the count is monotonic non-increasing in h by construction. Validated (region_persist_gate.py): synthetic single/cluster+lone/scattered → 1/2/5, stable across the whole h-range. The Manet that flickered 1→3→1 across threshold is now monotonic [9,6,6,4,3,3] across h and finds Christ + angels (≥3 masses) at 6/6 h-levels vs threshold's 1/4. At default h=0.15 → 4 masses (Christ + both angels separated), void=0 (correct — a tight cluster, not void-separated), indisp 0.34. The blind spot is now a measured, monotonic quantity; composition_persistent is the recommended per-work reader, and h itself is the honest output (report count vs persistence-depth, not one number).
Manet, before/after. OLD single-threshold (red): 1 merged region, angels dropped, non-monotonic across threshold. STABLE persistence h=0.15 (cyan): 4 masses — the Christ body and both angels separated, markers on each — monotonic and stable in h. The Persistence principle, applied to segmentation itself.
The h knob + per-work mask gallery (what the masks actually produce) diagnostics
What does h change? h = the minimum basin DEPTH (dynamics) a support-field maximum must have to count as its own mass. It does NOT move masses or alter the field — the mass locations are fixed by the field's maxima; h only sets how prominent a local maximum must be to be promoted to its own region vs absorbed into a taller neighbour. LOW h = finer (more sub-masses: a figure splits into head/limbs); HIGH h = coarser (only dominant masses; subordinate ones merge in). Monotonic non-increasing in h by construction (raising h never adds a mass). So h is a grain / level-of-analysis knob, not a spatial one — read the work at major-mass scale or sub-mass scale by choosing h, and report the count as a function of h, not one number. Honest grain note: the default h=0.15 reads FINELY on busy works (Degas dancers → 15 masses, Cézanne → 9); for major compositional masses, h≈0.20–0.26 (Manet → 3 = Christ + 2 angels). The masks land on real structure at every h (verified by eye); the question is only grain.
The h knob, Manet across h=0.08→0.26: 6→5→3→2 masses. As h rises, sub-masses (a figure's parts, each angel) merge into their dominant parent; the locations never move. Coarse↔fine, monotonic — pick the grain for the question.
Per-work diagnostic (original | Notan mass field | persistence-stable regions + indispensability), Degas Dancers: at h=0.15 it reads finely (15 masses, void 0.61) — the dancers, tutus and ground each a mass. Real structure, fine grain.
The OTHER images the masks produce — the per-evidence-FAMILY belief fields for Cézanne's Card Players: edge / tone / colour / perturbation each fire differently (the disagreement the belief vector keeps), then the family-weighted consensus. A colour-built mass lights the colour family and little else; the consensus is where they agree.
Rendered 10 per-work diagnostics with the stable reader (diag_render.py) + family galleries (mask_gallery.py) + the h-sweep (hsweep_render.py). Per-work visual verification is now a standing tool, not an afterthought.
Construction-singularity — built, gated, and an HONEST NEGATIVE on the woman honest negative
The full readout ranked the seated woman LAST on indispensability (small, area-biased). Built singularity.py to fix it: a region's singularity = the drop in unweighted evidence- diversity when ablated — area-independent, rewarding a region that is the SOLE carrier of a construction (the Evidence axis contributing to Dependency). Synthetic gate PASSES (singularity_gate.py): a SMALL unique-colour disc among 3 redundant tonal discs reads spatial 0.13 (low) but singularity 0.18 (highest) → ranks #1 by load_bearing despite its size; redundant discs read NEGATIVE singularity. But on the real Saltimbanques it does NOT crown the woman, at any grain. Coarse (3 regions): she's only marginally most-unique (0.213 vs 0.206 — the upper figure is equally extreme/tonal), and her tiny spatial keeps load_bearing last (0.10 vs 0.24). Fine (8 regions): "colour" isn't unique anymore (several colour sub-regions), uniqueness dilutes, big regions dominate. The gate passed because it was the idealised "many identical + one unique"; the painting is "a few genuinely distinct masses." The real finding: THREE structural axes now fail to crown her — indispensability (extent), singularity (uniqueness), spatial. Her counterweight status (Rilke/Arnheim) is most likely a balance / leverage property — a small mass at large distance holding eccentric equilibrium (torque, not extent or uniqueness) — and none of the current measures encode balance. Kept singularity (valid + gated; it correctly reads the cluster as redundant/average, −0.108) — it is NOT the woman-detector. Next (recommended): a balance/leverage measure (change in compositional imbalance on ablation), carefully designed + gated — OR the honest boundary that her necessity is a perceptual-balance claim the structural-evidence instrument doesn't measure, while it correctly characterises her as the unique colour mass.
BALANCE / leverage — the counterweight axis CROWNS the woman validated
The clear next hypothesis: her role is the eccentric COUNTERWEIGHT (small mass, large distance = leverage), not extent or uniqueness. Built balance.py. Visual balance is a sum of moments about the frame centre, M = Σ wᵢ·rᵢ — Rilke's "many-digited sum"; a balanced field is where it solves toward zero, |M| is the imbalance. A region is a load-bearing counterweight when removing it makes the sum stop cancelling: balance_dependency(i) = |M − wᵢrᵢ| − |M| (positive = removal unbalances = it was holding the field; negative = heavy side). Synthetic gate PASSES (balance_gate.py): counterweight signs correct, symmetric pair both-positive-equal, centred mass ~0, leverage amplifies a distant counterweight. On the real Saltimbanques the woman is CROWNED — the sole counterweight. Plain AREA weighting: woman bd +0.0116 (only positive region), cluster −0.006, upper −0.008. Leverage weighting: woman +0.028, still #1. A 2%-area figure is the balancer — her DISTANCE (0.55, largest moment arm) makes her term decisive. After indispensability (extent) and singularity (uniqueness) both failed, the balance axis succeeds for the art-historically right reason: she is the term that helps the many-digited sum solve toward zero — remove her and it no longer cancels. Dependency now has three facets and the woman separates them: load-bearing by extent (last), by construction (shared), by COUNTERWEIGHT-balance (#1, alone). Honest caveat (Russell): "load-bearing" here IS a definition (increase in a moment-sum's distance from zero); that she reads counterweight even under minimal AREA weighting (not only the Arnheim leverage premium) makes it robust, not a definitional artifact — but three definitions were silent on her and one speaks. The convergence of this measure with the century-old reading is the finding, held as a lens that illuminates imperfectly, not the object itself.
Visual balance as the moment sum about the frame centre (+). Cyan = counterweight (removal increases imbalance); red = heavy side. The seated woman (2% area, far lower-right) is the SOLE cyan counterweight — the term holding the sum toward zero. The standing cluster and upper figures are the heavy side it balances.
earlier → diagnostic screen
Rather than guess a third statistic, screen candidate centricity scalars against the four clean controls and keep only what separates Rembrandt-centric from Saltimbanques- eccentric without echoing the chiaroscuro reference. Results below.

3. Calibration — clean controls

Persistence rank-orders by tonal contrast, not composition — the falsification table.

Workrole (art-historical)persistencebridge_costtonal character
Mondrian — Composition IIdistributed but settled0.5000.930flat, hard-edged
Caravaggio — Calling of St Matthewdirectional0.6670.678tenebrism
Picasso — Family of Saltimbanqueseccentric / metastable0.6670.877flat, soft (Rose)
Rembrandt — Self-Portrait (1660)centric (single figure)0.7500.771deep chiaroscuro

Expected by composition: Rembrandt most centric, Saltimbanques most eccentric. Observed by persistence: the reverse — it follows chiaroscuro (flat Mondrian lowest → chiaroscuro Rembrandt highest). persistence falsified as a centricity measure

Saltimbanques full readout. Top row — the heartbeat: Notan mass field, the collapse cascade, and the persistence curve (the hump). Bottom — the L6 island density / island map / resistance graph. The persistence/ECCENTRIC verdict shown here is the confounded measure (it tracks chiaroscuro); the trustworthy reading is the figure-ground work in §4b/§6.

4. Diagnostic screen — candidate centricity scalars

no candidate survives — each passes the narrow Rembrandt-vs-Saltimbanques test but fails the 4-work sanity check, each with a diagnosed confound.

metricRembrandt
(centric)
Caravaggio
(direct.)
Mondrian
(settled)
Saltimb.
(eccentric)
verdict
chiaroscuro (L std) — confound ref0.0780.1090.2900.130reference
persistence0.7500.6670.5000.667= chiaroscuro
mass_conc (lo=centric)0.5140.6600.6850.607= coverage/vignette
anchor_dom (hi=centric)0.2510.4200.3960.214gradient frag.
convergence (hi=centric)0.8630.7670.8260.792flat / noise

mass_conc ranks Mondrian (full-bleed) most-spread and Saltimbanques as concentrated → measures coverage, not centricity. anchor_dom calls Caravaggio/Mondrian more single-anchored than a single-figure portrait → gradient fragmentation. convergence nearly flat, directional Caravaggio wrongly lowest. Two structural lessons: (1) the extremes separated only via coverage/vignette — Rembrandt is a vignetted single figure, which almost anything separates from a spread scene, so the 2-point test is too weak; (2) four paintings co-vary every factor at once (contrast, coverage, count, spread) — you cannot validate a coordinate on stimuli where the confounds travel together. Caravaggio and Mondrian are likely distinct modes, not midpoints on one axis.

Net: we do not yet have a single validated structural coordinate for composition. persistence, bridge_cost, mass_conc, anchor_dom, convergence — all confounded.

4b. Channel exposure — the path forward positive

Russell's reframe (looking at the artifact): the painting exists as blobs that correlate and separate; a low-threshold Canny (~15–25%) defines the figure silhouette; then a tonal or colour game characterises what is inside. The artist is hiding — a grouping survives in at least one of the three channels (edge / tone / colour), and which one carries it is diagnostic. Rendered the six-channel diagnostic (channels.py) for three very different works. It works, and which channel carries the grouping varies by work:

Rembrandt — the low-Canny silhouette holds the figure whole; tone fills the internal cascade; colour is inert (monochrome). Edge + tone carry it.
Picasso — colour cracks the fused clump (harlequin/jester/boy); the void map surfaces the woman↔cluster load-bearing gap.
Matisse — colour builds the flat zones; the neutral gray field reads as the load-bearing void. Colour-primary.
channelRembrandt (chiaroscuro single figure)Caravaggio (tenebrist directional)Picasso (clumped figures)
low-Canny silhouetteholds figure whole (interior cascade absent → no fragmentation)defines lit figuresoutlines clump + woman
tonal (Notan)fills internal cascadelit-vs-dark banddark figures vs ground
colour (hue×chroma)inert (monochrome)mutedcracks the clump (harlequin/jester/boy by hue)
void mapmerges dark figure into ground (chiaroscuro limit)dark wallshows the load-bearing gap (woman ↔ cluster)

The measure that falls out — perceptual figure-ground grouping, not a tonal scalar: (1) low-Canny silhouette + confirmed edges (L∩C) bound the figures — this solves the Rembrandt chiaroscuro confound (outline captured even when interior is dark, where mass field and void map both failed); (2) void separates figure-groups from ground and surfaces the load-bearing gap; (3) colour subdivides a fused clump. Centric↔eccentric = number of figure-groups + void mediation (Rembrandt 1 = centric; Picasso cluster+woman across a void = eccentric; Caravaggio directional). This is Gestalt figure-ground (perceptual organisation), NOT object recognition — edge-bounded, void-separated, colour-subdivided regions, no learned weights, no semantics. It stays inside the deterministic/training-free contract; the figure-ground tension is resolved. Artifacts: outputs/channels_*.png.

Painter-strategy fingerprint — which channel carries the grouping (5 works)

An emergent result: the carrying channel is not noise, it is a signature of how the work is built. Added Matisse, The Piano Lesson (1916) as a colorist-flat test.

workcarrier channel(s)reading
Rembrandt — Self-Portraitedge + tonesilhouette holds the figure whole; value fills the interior; colour inert (monochrome)
Caravaggio — Calling of St Matthewedge + tone, directionaltenebrist value band; the light vector organises it
Picasso — Saltimbanquescolour + voidcolour cracks the fused clump; the void separates the woman from the cluster
Matisse — The Piano Lessoncolour + void (gray field)colour builds the flat zones; a neutral gray field mediates; clean geometric edges
Mondrian — Composition IIcolour + hard edgeflat colour planes on a black-line scaffold

This is the disegno↔colorito axis, measured. Rembrandt/Caravaggio build with line-and-value (Florentine disegno); Picasso/Matisse/Mondrian build with colour (Venetian colorito). A 500-year-old art-historical distinction falls straight out of "which channel carries the grouping." Caveat: Matisse's "ECCENTRIC, persistence 0.667" verdict is right for the wrong reason — the work genuinely is distributed, but the confounded persistence measure hit it by luck, not by reading centricity. The trustworthy signal is the channel profile, not the scalar.

5. What we have learned

6. Where we are going

The pivot reframes the whole program. The boundary below was mapped under the old "measurement → scalar" approach. ConsensusMask (above) changes the substrate: instead of measuring the image, ask what regions the image repeatedly insists on under independent constructions. That gives a deterministic region substrate without semantics — and the relations (anchor dominance, void mediation, single- vs multi-centre) get measured between consensual structures, not detected objects. The architecture:

Image → channel masks (edge/tone/chroma) → scale + perturbation survival → CONSENSUS REGIONS → region graph → composition readout

Status & next builds:

Remaining:

Standing lesson reinforced: synthetic-gate every new coordinate before paintings. Composition v1 skipped it and inverted the key case; the paintings caught it, but the gate should have. The consensus substrate is right (support field lights figures); only the composition layer's region- definition was wrong (raw regions + threshold instead of figure-group connectivity + consensus weight).

The boundary as mapped under the prior approach (for the record):

reachable + validatedblocked (boundary)
figure-ground groups (step 1)separating touching/clumped figures into individuals
centricity / void-separated groupsfigure-level rhythm (beat, syncopation)
load-bearing void between groupsdirectional device / gaze (semantic)
channel fingerprint (disegno↔colorito, observed)

Two honest forward options:

Honest net: the original skepticism ("not convinced it can be done on either account") is answered — one structural device (the void) is identifiable and validated; two (rhythm, gaze) are named limits. A partial yes with mapped edges, not an overclaim.

7. Provenance & integrity

ArtifactPath
Engine + new layersparallax_portrait/ (resistance, persistence, convergence, scanpath, read, generalist, viz, full_kernel)
Canonical L6 (verbatim source)sources/layer6_resistance_canonical.py
Build path + provenanceBUILD_PATH.md
Art-theory spine + synthetic findingsART_MAPPING.md
Verification & hardening logVERIFICATION.md
Calibration results (+ contamination correction)CALIBRATION_RESULTS.md
Screenscreen_centricity.py
Artifacts (per work)outputs/viz_*.png
Clean controlsimages/calibration/ (rembrandt, caravaggio, mondrian, saltimbanques)

Retractions on record: calibration run #1 Saltimbanques 0.833 (contaminated image); "central prediction held" (artifact of contamination). Falsified: persistence as a centricity measure; bridge_cost as a Ma/void measure. Both kept in the trail, not deleted.