Fixtures Beat Vibes

Every recursive check runs against fixtures. Zero free-form LLM self-critique. No LLM grading LLMs.

Fixtures beat vibes is the rule Arkeus uses to evaluate its own behavior. Every recursive check — did the agent do the right thing, is the output on-voice, is the correction landing — runs against fixtures. Golden examples, replay harnesses, rule tables, known-bad incidents. Never free-form LLM self-critique.

The reason this rule exists is that a model asked to critique its own output inherits its own bias. If the output was sycophantic, the critic will also be sycophantic. If the output was overconfident, the critic will agree. The failure mode is symmetric: the critic cannot see the thing the model could not see, because they are the same model with the same priors.

Fixtures break that symmetry. A golden example is external to the model. A replay harness is a recorded incident the model cannot rewrite. A rule table is a set of constraints the model did not author. When an agent's output is evaluated against a fixture, the evaluation is anchored to something the model did not get to choose.

The practical implementation: synthetic incidents land in inbox/replay/. Golden disagreements and golden appeasements live in files the model cannot edit. Belief ledger walks run against known retraction cases. The monthly mirror produces a drift narrative that must be contrastive, never purely positive; a purely positive mirror is sycophantic and automatically triggers a critique.

The rule bans the seductive failure mode of asking an LLM to grade its last five responses and assess quality. That request always produces confident output, and the output is always lower signal than a single fixture test. Arkeus would rather have one solid fixture than ten thousand tokens of self-assessment.

Correction

A contrastive entry in corrections.md. What went wrong, then the rule that prevents it from happening again.

Refraction

A prism bends signal through context and produces something only this lens could produce. Arkeus refracts, it does not reflect.

Portability Test

Every kernel rule must survive a model swap. If it only works because of Claude-specific training, it does not belong.

← back to glossary