Claude Code agents and subagents: what they actually unlock

Claude Code agents and subagents: what they actually unlock

I set up Claude Code agent files when the feature first landed. Created a few .claude/agents/ definitions, gave them names and tool restrictions, felt good about it, and then gradually stopped thinking about them. At some point Claude just started handling things well enough on its own that the agent files sat there gathering dust. They're still in my repos. They just don't get called.

I kept telling myself I wasn't missing anything, but context quality on my medium-to-large solo projects had started to feel off, with responses getting vaguer as sessions grew and the model losing track of decisions made earlier in a conversation. I was working around it using Cont3xt.dev, a tool I built specifically to manage AI context, and that helped, but it felt like I was solving a symptom rather than understanding the actual problem. So I went back and dug properly into what agents and subagents actually do, and more importantly what they unlock that a single-agent session can't.

The context window is the whole story

Standard Claude Code gives you a 200K-token context window per session. That sounds enormous until you're in a multi-hour session on a project with a dozen files open, a long conversation history, and tool call outputs stacking up. By the time you hit two-thirds capacity, response quality degrades noticeably, not because the model is worse, but because the context is full of noise and the model has to attend to all of it equally. I'd been experiencing this without quite naming it.

Subagents solve this by giving each delegated task its own isolated 200K-token context. The parent agent spawns a subagent with a specific prompt, the subagent does its work, reads files, runs searches, makes tool calls, and returns only its final output to the parent, be that a summary, a result, or a recommendation. All the intermediate noise stays inside the subagent's context and never touches the parent's conversation. The parent gets the signal, not the noise.

This is the actual value. Not parallelism, not specialisation, not the organisational tidiness of named agents. It's that isolation prevents the context rot that compounds over long sessions.

What the architecture looks like

The orchestrator-worker pattern is fairly simple. A parent agent analyses a task, decides whether to handle it directly or delegate it, and uses the Agent tool (previously called the Task tool, and both names still work) to spin up a subagent with a prompt string. The subagent runs with its own context, tool access, and permissions, then returns a single final message. Subagents cannot spawn further subagents, which keeps the nesting manageable.

Claude ships with three built-in subagent types. The Explore type handles read-only file discovery and codebase search, running on Haiku by default for speed and cost. The Plan type gathers context before presenting a strategy in plan mode. The General-purpose type handles anything involving both exploration and modification. Claude routes to these automatically based on task characteristics, though the auto-selection is imperfect in practice, and more on that below.

Custom agents are defined as Markdown files with YAML frontmatter, stored in .claude/agents/ at project scope or ~/.claude/agents/ at user scope. A basic definition looks like this:

---
name: code-reviewer
description: Expert code review specialist. Use immediately after modifying code.
tools: Read, Grep, Glob
model: sonnet
permissionMode: default
---
You are a senior code reviewer checking for bugs, security issues, and code quality.
Review any code changes and return a concise list of specific findings.

The tools field does something genuinely useful here: it physically restricts what the subagent can do. A reviewer defined with only Read, Grep, Glob cannot write files. That's not a naming convention or a prompt instruction, it's a hard constraint. For a solo developer running with broad permissions, having a review agent that structurally cannot modify code is worth something.

The model field lets you route different tasks to different models. Haiku for cheap exploratory reads, Sonnet for standard implementation, Opus for complex reasoning. On pay-as-you-go pricing the cost difference between models is substantial, and there's no reason a "find all files that import this package" task needs Opus.

The real cost picture

Subagents are not free. Each spawned agent opens its own context window, which means tokens multiply quickly. Anthropic's own documentation notes that multi-agent workflows use roughly 4-7x more tokens than single-agent sessions, and Agent Teams (the experimental multi-session variant announced in February 2026) run at roughly 15x standard usage. If you're on the API and paying per token, that multiplier matters.

I've written in detail about the Claude Code pricing options and the Max plan economics, and the core finding holds: over 90% of tokens in a typical heavy session are prompt cache reads at $0.50/MTok for Opus, which dramatically softens the apparent cost of subagent expansion. But the multiplier is still real, and running five parallel subagents burning through exploratory reads simultaneously on the Pro plan is a reliable way to hit rate limits in under twenty minutes.

The sensible approach is narrow scoping: use subagents for read-heavy, bounded tasks with a clear output, and keep the main session for anything requiring sustained, cross-cutting context. Don't spawn agents because you can.

Where they actually help

The most consistent community finding, which matches the logic of the architecture, is that subagents work best for read-heavy research and exploration, not parallel coding. A subagent sent to find all places a particular function is called, summarise a subsystem's behaviour, or check whether a proposed change would break any existing contracts will produce a small, clean output and keep its exploration cost internal. That's the use case the architecture is optimised for.

The C compiler example Anthropic uses as a flagship demonstration is instructive: 16 Opus agents, 2,000 sessions over two weeks, $20,000 in API costs, building a Rust C compiler from scratch. Impressive, and structurally possible only with subagents. Also completely inappropriate as a model for a solo developer on a SaaS product. The lesson from that project that does transfer is the decomposition principle: each agent worked on an independent failing test, with no cross-agent dependencies. When tasks are truly independent, parallel agents compound your speed. When they're coupled, you get coordination overhead and conflicting changes.

For medium-to-large solo projects, the pattern I find most defensible is 2-3 focused information-gathering agents running in parallel, with the main session synthesising their outputs and making decisions. An agent that reads and summarises all test failures. An agent that checks the database schema for a relevant table. An agent that scans for existing implementations of a pattern you're about to add. All of them returning concise outputs to a main session that then acts. That's meaningfully better than a single session doing all of those reads sequentially, because the context stays clean.

What doesn't work well yet

Auto-selection of custom agents remains unreliable. Claude frequently handles tasks in the main session rather than delegating to a defined agent, even when the agent is explicitly relevant and its description matches the task. The only reliable trigger is explicit invocation, which defeats the purpose of automatic routing for anyone who wants a seamless workflow. There are open GitHub issues on this, and it's a known gap, not an edge case.

Claude Opus 4.6 has a known tendency to over-spawn subagents. Anthropic's own prompt engineering documentation flags it: Opus will delegate to agents in situations where a direct approach would be faster and cheaper. If you're on Opus and wondering why a simple task consumed 50K tokens, an unnecessary delegation is a likely cause.

And there's no native observability. No trace view, no per-agent cost breakdown, no way to see what a running subagent is doing without looking at raw outputs. If you're building on the Claude Agent SDK, third-party tools fill some of this gap, but in the terminal Claude Code workflow you're largely flying blind on costs and subagent activity.

The hooks connection

Worth noting for anyone already using Claude Code hooks: hooks interact with subagents through dedicated lifecycle events, SubagentStart and SubagentStop, which means you can instrument your subagent activity, apply tool-level restrictions via PreToolUse, and validate outputs before they reach the parent. If you're already invested in hooks, the subagent lifecycle events add a meaningful layer of control over what agents can actually do.

Whether it's worth revisiting

I went into this research expecting to find that the feature had matured and I was missing something obvious. The answer is more nuanced. The core innovation, isolated context windows that keep exploration noise out of the main session, is genuinely valuable and solves a real problem I was experiencing on larger projects. The custom agent definitions give you a readable, version-controlled way to encode tool restrictions and model routing decisions that actually enforce behaviour rather than just prompting for it.

What hasn't fully landed is reliable automatic routing, which means you're often writing explicit invocations rather than building a system that knows when to delegate. And the cost multiplier is real enough that undisciplined subagent use will hurt you on any plan with token limits.

The old agent files in my repos are worth revisiting. Not as a multi-agent system with coordinating roles and specialised responsibilities, but as a small library of focused, read-only information gatherers that I explicitly invoke when I need clean, isolated context for a bounded research task. That's a narrower use case than the documentation implies, but it's one that actually maps to the architecture's strengths.


I write about this stuff every week. If you want to keep up with what's changing in Claude Code, Cursor and AI dev tooling, along with the Go and infrastructure work I do, the newsletter is where it all goes first.

Join the newsletter - it's free

I also do consulting on AI implementation and technical strategy. If you're working through something specific, get in touch.

Subscribe

Get new posts directly to your inbox
You've successfully subscribed to Kyle Redelinghuys
Great! Next, complete checkout to get full access to all premium content.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.
Error! Stripe checkout failed.
Success! Your billing info is updated.
Error! Billing info update failed.

What I'm building and learning, weekly

Claude Code configs, Go patterns, real costs and the tools I build to solve my own problems. One email, every week.

Now check your email to confirm your subscription.