Every recursive check runs against fixtures. Zero free-form LLM self-critique. No LLM grading LLMs.
Fixtures beat vibes is the rule Arkeus uses to evaluate its own behavior. Every recursive check — did the agent do the right thing, is the output on-voice, is the correction landing — runs against fixtures. Golden examples, replay harnesses, rule tables, known-bad incidents. Never free-form LLM self-critique.
The reason this rule exists is that a model asked to critique its own output inherits its own bias. If the output was sycophantic, the critic will also be sycophantic. If the output was overconfident, the critic will agree. The failure mode is symmetric: the critic cannot see the thing the model could not see, because they are the same model with the same priors.
Fixtures break that symmetry. A golden example is external to the model. A replay harness is a recorded incident the model cannot rewrite. A rule table is a set of constraints the model did not author. When an agent's output is evaluated against a fixture, the evaluation is anchored to something the model did not get to choose.
The practical implementation: synthetic incidents land in inbox/replay/. Golden disagreements and golden appeasements live in files the model cannot edit. Belief ledger walks run against known retraction cases. The monthly mirror produces a drift narrative that must be contrastive, never purely positive; a purely positive mirror is sycophantic and automatically triggers a critique.
The rule bans the seductive failure mode of asking an LLM to grade its last five responses and assess quality. That request always produces confident output, and the output is always lower signal than a single fixture test. Arkeus would rather have one solid fixture than ten thousand tokens of self-assessment.