Context Engineering

The discipline of loading minimum viable context. More tokens does not mean better answers.

Context engineering is the practice of getting the right tokens into a model's context window rather than getting more tokens in. It is the opposite of the prompt-stuffing approach that dominated 2024 and early 2025, and the reason for the shift is that the empirical evidence caught up with the intuition.

The Chroma research showed accuracy degradation starting at 20 to 30 thousand tokens. Large context windows are real capabilities, but they are not free capabilities. The model's attention is finite, the signal-to-noise ratio drops with every additional token of context that is not load-bearing, and the output quality follows.

The Arkeus approach: load the kernel always (around 3,700 tokens of always-on cognition), load domain files on demand (when the conversation enters a domain), load agent specs when agent behavior is needed, and leave everything else in the on-demand layer. The full memory graph is on disk and searchable. The fraction of it that is loaded into any given cycle is tiny and deliberately chosen.

The V6 15Q test in March measured this. Original heavy config at around 20,000 tokens of always-on content produced 93% directness across 15 questions. V6 at around 3,000 tokens produced 89% directness — 89% of the quality at 15% of the context. The leftover 11% of quality is not worth the 85% of context, and loading the larger config crowds out the actual work of the session.

The lean-ctx tools implement the discipline at runtime: ctx_read compresses and caches file reads, ctx_search trims grep output, ctx_shell pattern-compresses command output. The goal is to get the same information at a fraction of the token cost, because every token saved is a token available for reasoning.

Context engineering is not about being clever with prompts. It is about recognizing that attention is a finite resource and spending it deliberately.

Related
← back to glossary