Building with AI agents in the open

Notes on autonomous agent systems, Claude Code workflows, and the tools I build to make LLMs reason better.

Hooks, Not Hopes
Why prompt files, skill definitions, and agent instructions aren't enough for production agent systems — and what actually works.
Evals for Agents That Don't Have Weights
The entire eval industry assumes you can fine-tune a model. What happens when your agents are prompt-based, the problems are multi-dimensional, and the optimization surface is language?
Building an Autonomous Product Team
What happens when you give AI agents the same structure as a real product org — roles, sprints, quality gates, feedback loops — and let them run on their own.
How I Taught Claude Code to Actually Research
A three-stage pipeline that forces structured reasoning, kills premature convergence, and produces synthesis no single source contains.