Building with AI agents in the open

Notes on autonomous agent systems, Claude Code workflows, and the tools I build to make LLMs reason better.

Hooks, Not Hopes

Why prompt files, skill definitions, and agent instructions aren't enough for production agent systems — and what actually works.

Hooks Agent Architecture Production

Evals for Agents That Don't Have Weights

The entire eval industry assumes you can fine-tune a model. What happens when your agents are prompt-based, the problems are multi-dimensional, and the optimization surface is language?

Evals Autonomous Agents Feedback Loops

Building an Autonomous Product Team

What happens when you give AI agents the same structure as a real product org — roles, sprints, quality gates, feedback loops — and let them run on their own.

Autonomous Agents Product Teams Multi-Agent

How I Taught Claude Code to Actually Research

A three-stage pipeline that forces structured reasoning, kills premature convergence, and produces synthesis no single source contains.

Claude Code Research Skills