Don't Build Agents, Build Primitives.

March 6, 2026
work

Everyone is building agents.

Custom orchestration layers. Proprietary graph frameworks. Bespoke tool-calling chains. Teams spending months wiring together node → node → node workflows, hand-coding what the model should do at each step.

Here's what I've learned deploying agents in production at enterprise scale: most of that work is unnecessary. And more importantly, it's the wrong abstraction.

The shift that most people missed

Two years ago, the conventional wisdom was reasonable. You couldn't trust a model to make good decisions autonomously. So you defined the workflow. You mapped the graph. You told the agent: call tool A, then tool B, then route here if condition X, else go there.

That world is gone.

Modern frontier models are genuinely capable orchestrators. They can reason about which tool to call, when to call it, how to recover from failure, and when to stop. They don't need you to define the graph. They are the graph. The orchestration problem is largely solved — by the model itself.

What hasn't changed is what the model needs to do its job well. That's where most teams still get it wrong.

The real problem: no one is building the right things

If the model can orchestrate, then your job as an engineer shifts. You're no longer building the agent. You're building what the agent reaches for.

In Anthropic's framing — one that matches what I've seen actually work in production — the move is: build skills, not agents.

A skill isn't a prompt template. It's a proper primitive: a packaged unit of capability that an agent can discover, load, and use reliably. Done right, a skill contains the instructions, the tools, the context, and the guardrails specific to a task. The agent decides when to use it. You decide what it's allowed to do.

The full stack of primitives that matters in production:

  • Skills — packaged, reusable task capabilities with their own instructions and tools

  • MCPs (Model Context Protocol) — the plumbing for how the model connects to external systems and data

  • Hooks — intercept points where you can observe, validate, or modify agent behavior in flight

  • Guardrails — the constraints that keep the agent inside acceptable boundaries without hardcoding the workflow

  • Memory — how the agent retains and retrieves context across sessions and tasks

If you have these five built well, you have something composable. Something that can extend a mature, battle-tested harness rather than replace it.

Why you should not build the harness

This is the part that took me time to say plainly: the harness isn't your problem to solve.

Companies like Anthropic have run thousands of evaluations on tool-calling behavior. They know which failure modes appear and when. They know how model behavior shifts with context window size, instruction phrasing, and tool description quality. They've tested edge cases you haven't imagined yet because they've seen them in the wild at a scale you haven't reached.

Claude Code is a good example. It's not just a wrapper. It's a harness tuned against real usage patterns — refined through evaluations, red-teaming, and production feedback. When you adopt it, you're not taking on someone's first draft. You're inheriting years of iteration on what actually works.

Most teams don't think of it this way. They see an existing harness and immediately want to fork it, replace it, or abstract over it. They spend six months building their own orchestration system. Then they spend the next twelve debugging it.

The alternative is to go with the harness and invest your engineering effort into what it needs to be effective: better skills, tighter guardrails, sharper system prompts, and proper memory design.

What this means practically

If you're evaluating whether to build a custom agent framework or adopt something like Claude Code:

The question isn't "does this harness do everything we need?" — it probably doesn't, out of the box. The question is: "Can we build the primitives it needs to do our job, without touching the orchestration core?"

In most enterprise contexts, the answer is yes. The specialization you need lives in the skills and the guardrails — not in the tool-calling loop.

The teams I've seen succeed with agents in production aren't the ones with the most sophisticated orchestration code. They're the ones who built clean, well-scoped primitives, plugged them into a harness that already worked, and spent their time on the hard problems: memory design, failure recovery, eval coverage, and context management.

The reframe

Agents aren't products. They're configurations of primitives running on top of capable models.

The model handles the orchestration. The harness handles the infrastructure. Your job is to build what goes in between: the skills that define what the agent knows how to do, and the guardrails that define what it's allowed to do.

Stop re-building the stack. Build the primitive.