What we're learning while building AI systems with our clients — written by the engineers doing the work.
Coming soon
How we test, score, and harden retrieval pipelines before they touch a customer — including the eval traps that hide quality regressions.
Get notified →
Forthcoming
Thirty questions we ask before we let an LLM-powered agent take action on a real system. Half of them have nothing to do with the model.
Patterns for blending Claude Opus / Sonnet / Haiku without sacrificing the quality your users notice.
What a real evaluation pipeline looks like once you stop treating models like black boxes — and what to do when the score drops.
A pattern language for tasks that don't need the smartest model — and how routing keeps the smartest model for the moments that matter.
kategos Intelligence