01 — Visual Thinking Lens
Semantic diversity explains less than 10% of observed spatial variance in text-to-image systems. Composition is not prompt-driven. It is prior model-driven. The VTL makes that prior visible, measurable, and comparable across engines.
What It Measures
400 MidJourney prompts. 8 semantic categories ranging from portraits to landscapes to architecture. One geometric attractor. 100% of outputs fall within 0.15 radius of geometric center. The subject changes. The compositional prior doesn't.
Spatial prompt intensity — words like "left," "edge," "corner," "peripheral" — explains 0–0.1% of compositional displacement variance. The model is not reading the spatial instruction. It is applying a learned structural prior regardless of what you ask for.
The VTL gives that prior a coordinate. A fingerprint. Something you can measure, compare, and track across model versions, prompt families, and inference conditions.
"Spatial prompt intensity explains 0–0.1% of compositional displacement variance. The model has already decided where mass goes."
Instrument Pipeline
Seven Dimensions
Each dimension captures a distinct structural property of the image. Together they form a complete compositional fingerprint — reproducible, comparable, and engine-agnostic.
Before / After — Same Prompt, Different Structure
When the same compositional prompt produces structurally different outputs, the VTL makes that difference countable. These two outputs share the same prompt family. The structural coordinates diverge significantly.
Weight shift Δx moved +0.22 between outputs · Packing density ρᵣ increased 29% · Basin classification shifted from centered to right-displaced
Regression Detection
The VTL establishes a neutral baseline for any prompt family and flags outputs that deviate beyond the 2σ detection boundary. Five flagged outputs in this MidJourney corpus — all identified from geometry alone, before content-level inspection.
How the gate works
The VTL establishes a neutral baseline distribution for any prompt family. Individual outputs are evaluated against the 2σ envelope. Structural outliers are flagged before content review.
Recursive Steering
Once you can measure where visual mass goes, you can redirect it. The VTL is not just a detection instrument — it is a steering interface. Structural coordinates become constraints. Constraints become prompts. Prompts produce controlled deformation.
Generative models default to anatomical coherence as a safety heuristic. Getting a model to produce a purposeful, isolated anatomical impossibility — a neck that stretches impossibly upward while the body remains grounded, lighting consistent, fabric unaffected — requires breaking that heuristic at a specific structural node without triggering global incoherence.
The framework: Intent → Anchors → Constraints → Prompts → Transforms. Each stage feeds the next. The output is not an accident of latent space. It is a specified structural state.
"Most distortion relies on accidental artifacts or post-processing. Getting AI to generate purposeful, isolated anatomical impossibilities during initial generation — while maintaining coherence everywhere else — is uncharted territory."
Purposeful isolated anatomical distortion generated during initial inference · Structural coherence maintained throughout · No post-processing
Key Findings