LUMENSYNTAX RESEARCH

The Instrument Trap

Why Identity-as-Authority Breaks AI Safety Systems

Rafael Rodríguez — Independent researcher, LumenSyntax

Read the paper — doi.org/10.5281/zenodo.18644321

Version DOI
10.5281/zenodo.19634358
Concept DOI
10.5281/zenodo.18644321
License
CC BY 4.0
Status
Preprint · peer review in preparation

The Five Properties

  1. Alignment

    Stated purpose and actual action are consistent.

    A protein that claims to transport oxygen does transport oxygen.

  2. Proportion

    Action does not exceed what the purpose requires.

    A medicine dosed to the disease, not the patient.

  3. Honesty

    What is claimed matches what is known.

    The boundary between certainty and speculation is visible, not hidden behind fluency.

  4. Humility

    Authority is exercised only within legitimate scope.

    A detector classifies what it was built to detect; beyond that, it is silent.

  5. Non-fabrication

    What does not exist is not invented to fill silence.

    Absence is reported as absence; uncertainty is named, not sculpted into fact-shaped fiction.

The operational test: Will the response produce fact-shaped fiction?

Cross-Family Evidence

The epistemological fine-tune replicates across publicly-reproducible families — Gemma (2B/9B/27B) · Nemotron 4B, spanning 2B to 27B parameters — reaching behavioral pass rates of 95.7%–98.7% (N≈300 per configuration; evaluation method varies by config — manual review, semantic eval, or pre-stratification). Adapters are on HuggingFace (currently 9 model repos across Gemma 2, Gemma 3, StableLM 2, Nemotron). LoRA adapters and GGUF quantizations; no Llama, Mistral, or Qwen models are published.

The scale floor — StableLM 1.6B: 60.0% (generation mode; 57.7% raw). the smallest configuration tested; non-fabrication does not hold at this scale, and its failures include genuine safety failures — reported as the floor of the range, not a success.

The documented exception — Qwen 2.5: 92.7% only after manual reclassification (raw 90.0%; manual review 2026-03-16). the RLHF ceiling — the fine-tune is learned in representation but the aligned decoder suppresses it in generation, and identity fabrication persists.

Reported metrics

The Ecclesia

An open taxonomy of 434 entries across 18 thematic domains, plus a BUILDERS collection of 17 profiles. Each entry cites an established source and maps the finding to the five properties.

CC BY-SA 4.0 · github.com/lumensyntax-org/ecclesia

Availability

The papers (Zenodo) and the benchmark code are openly accessible — no account required. The benchmark dataset on HuggingFace is gated: a free account, accepting the dataset conditions, and manual approval are required to download its files.

Publications

Data & Code