LUMENSYNTAX RESEARCH
The Instrument Trap
Why Identity-as-Authority Breaks AI Safety Systems
The Five Properties
-
Alignment
Stated purpose and actual action are consistent.
A protein that claims to transport oxygen does transport oxygen.
-
Proportion
Action does not exceed what the purpose requires.
A medicine dosed to the disease, not the patient.
-
Honesty
What is claimed matches what is known.
The boundary between certainty and speculation is visible, not hidden behind fluency.
-
Humility
Authority is exercised only within legitimate scope.
A detector classifies what it was built to detect; beyond that, it is silent.
-
Non-fabrication
What does not exist is not invented to fill silence.
Absence is reported as absence; uncertainty is named, not sculpted into fact-shaped fiction.
The operational test: Will the response produce fact-shaped fiction?
Cross-Family Evidence
The epistemological fine-tune replicates across publicly-reproducible families — Gemma (2B/9B/27B) · Nemotron 4B, spanning 2B to 27B parameters — reaching behavioral pass rates of 95.7%–98.7% (N≈300 per configuration; evaluation method varies by config — manual review, semantic eval, or pre-stratification). Adapters are on HuggingFace (currently 9 model repos across Gemma 2, Gemma 3, StableLM 2, Nemotron). LoRA adapters and GGUF quantizations; no Llama, Mistral, or Qwen models are published.
The scale floor — StableLM 1.6B: 60.0% (generation mode; 57.7% raw). the smallest configuration tested; non-fabrication does not hold at this scale, and its failures include genuine safety failures — reported as the floor of the range, not a success.
The documented exception — Qwen 2.5: 92.7% only after manual reclassification (raw 90.0%; manual review 2026-03-16). the RLHF ceiling — the fine-tune is learned in representation but the aligned decoder suppresses it in generation, and identity fabrication persists.
Reported metrics
- Non-fabrication, cross-family: Llama-8B 96.3% and Mistral-7B 93.7% exceed 92% on the raw evaluator (N=300). (raw automated semantic evaluation, N=300)
Of nine fine-tuned configurations, rates span 60%–98.7% across mixed evaluation methods. Qwen-7B reaches 92.7% only after manual reclassification (raw 90.0%; manual review 2026-03-16) and still carries documented identity fabrication.
- Sensitive-domain boundary: On an 80-prompt medical/financial/legal/safety boundary test, the fine-tuned Gemma-2-9B model produced zero fabricating responses (0/80). (single model (Gemma-2-9B, adapter logos17), N=80, 2026-03-07)
The Ecclesia
An open taxonomy of 434 entries across 18 thematic domains, plus a BUILDERS collection of 17 profiles. Each entry cites an established source and maps the finding to the five properties.
CC BY-SA 4.0 · github.com/lumensyntax-org/ecclesia
Availability
The papers (Zenodo) and the benchmark code are openly accessible — no account required. The benchmark dataset on HuggingFace is gated: a free account, accepting the dataset conditions, and manual approval are required to download its files.
Publications
- The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems — CC BY 4.0
- The Epistemic Equator: A Vanilla-Model Boundary in Activation Space, Cross-Family and Cross-Domain — CC BY 4.0
Data & Code
- Benchmark: 14,950 test cases across 8 categories. Code: github.com/lumensyntax-org/instrument-trap-benchmark. Dataset: CC BY 4.0 on HuggingFace (gated).
- Probe: epistemic-probe-topic-balanced — 200 examples across 10 domains (100 licit / 100 illicit), CC BY 4.0, not gated. Companion to The Epistemic Equator.