Claude Agent SDK: Subagents, Sessions and Why It's Worth It

I've been wanting to build a proper multi-agent setup for a while, and every time I sat down to figure out how to do it cleanly against the raw Anthropic API, I ended up writing boilerplate I didn't want to write. Checking stop reasons, feeding tool results back into the message array, managing context as it grows, deciding what to trim when you hit limits. None of it is hard, but all of it is work that isn't the actual problem you're trying to solve. I've been reading through the Claude Agent SDK docs and experimenting with it over the past few weeks, and it's a more complete answer to this than I expected.

What it is

The Claude Agent SDK is the infrastructure that powers Claude Code, exposed as a library. Anthropic renamed it from the "Claude Code SDK" in September 2025, which reflects that the runtime they built for a coding tool had become general enough to use for any agentic workflow. It's available as @anthropic-ai/claude-agent-sdk in TypeScript and claude-agent-sdk in Python.

The core of it is a query() function that returns an async generator, yielding typed messages as the agent works through a task.

import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({
  prompt: "Review the authentication module for security issues",
  options: {
    model: "sonnet",
    allowedTools: ["Read", "Glob", "Grep"],
    permissionMode: "acceptEdits",
    maxTurns: 50,
  }
})) {
  if (message.type === "result" && message.subtype === "success") {
    console.log(message.result);
    console.log(`Cost: $${message.total_cost_usd}`);
  }
}

What you're not doing here is writing the agent loop. You're not checking stop_reason === "tool_use", executing tools, and feeding results back. The SDK handles that, along with automatic context compaction when the conversation approaches its limits and re-reading any CLAUDE.md instruction files after compaction. There are 14+ built-in tools - Read, Write, Bash, Glob, Grep, WebSearch, WebFetch and more - referenced by name as strings rather than defined as JSON schemas. You just list what the agent is allowed to use and let it get on with it.

Subagent orchestration

The reason I was drawn to this in the first place is the subagent model, which solves the orchestration problem cleanly. You define agent types in an agents parameter, each with its own description, system prompt, restricted tool access, and optionally a different model. You include Task in allowedTools, and when Claude decides a subtask fits one of those definitions, it spawns the subagent, gives it only the specific task, and gets back only the final result.

For a code review pipeline, this looks like:

for await (const message of query({
  prompt: "Do a full review of this codebase - security, test coverage, and performance.",
  options: {
    allowedTools: ["Read", "Glob", "Grep", "Task"],
    agents: {
      "security-reviewer": {
        description: "Identifies security vulnerabilities, injection risks, and auth issues",
        prompt: "You are a security specialist. Focus on SQL injection, XSS, CSRF, secrets exposure, and authentication bypass.",
        tools: ["Read", "Grep", "Glob"],
        model: "opus"
      },
      "test-analyzer": {
        description: "Analyses test coverage gaps and test quality",
        prompt: "Review test coverage, missing edge cases, and overall test quality.",
        tools: ["Read", "Grep", "Glob"],
        model: "haiku"
      },
      "performance-reviewer": {
        description: "Identifies performance bottlenecks and inefficient patterns",
        prompt: "Look for N+1 queries, unnecessary allocations, blocking operations, and slow algorithms.",
        tools: ["Read", "Grep", "Glob"],
        model: "sonnet"
      }
    }
  }
})) {
  if (message.type === "result") {
    console.log(message.result);
  }
}

A few things worth noting. Task must be in allowedTools or subagents never get spawned. Never include Task in a subagent's own tools, because subagents can't spawn their own subagents. And Claude decides when to parallelise - you're defining the capability, not the scheduling.

The context isolation is what makes this genuinely useful. Each subagent gets its own context window, so the security reviewer's deep read of the auth code doesn't pollute the performance reviewer's analysis, and you're not managing three concurrent conversation threads yourself. The subagents run, return their findings, and the parent stitches it together. I've been doing this kind of thing manually before when building with Claude Code hooks and the difference in how much glue code you need to write is significant.

Session persistence

Every query() call without a resume parameter starts a fresh session with no memory of anything before it. Capture the session_id from the init message and pass it back on the next call, and you continue exactly where you left off.

let sessionId: string;

for await (const msg of query({
  prompt: "Analyse this codebase and identify the three highest-priority security issues.",
  options: { allowedTools: ["Read", "Glob", "Grep"] }
})) {
  if (msg.type === "system" && msg.subtype === "init") sessionId = msg.session_id;
  if (msg.type === "result") console.log(msg.result);
}

// Continue with write access once you're happy with what the analysis found
for await (const msg of query({
  prompt: "Now fix those three issues.",
  options: {
    resume: sessionId,
    allowedTools: ["Read", "Edit", "Write"],
    permissionMode: "acceptEdits",
  }
})) {
  if (msg.type === "result") console.log(msg.result);
}

The practical value is being able to separate the read phase from the write phase - run analysis with read-only tools, review the output, then continue with write permissions only if you're happy with what it found. You can also forkSession to branch from an existing session if you want to try different approaches without losing the analysis context.

The trade-offs

The trade-off is real and worth being upfront about. You're locked into Claude models - there's no swapping in a different provider. The SDK runs by spawning a Claude Code CLI process as a subprocess rather than being a pure API library, which adds some deployment complexity. And agentic sessions that do substantial work can get expensive quickly, which matters a lot if you're building this into any kind of automated pipeline. I wrote about tracking Claude Code costs earlier, and the max_budget_usd parameter on query() is worth using from the start rather than finding out the hard way.

The broader picture is that Anthropic is building a three-layer stack: MCP as the protocol for agent-tool communication, Agent Skills as portable capability packages, and the Claude Agent SDK as the runtime. The MCP work I've been following sits at that protocol layer, and the SDK has MCP support built in, so custom tools defined as MCP servers plug in cleanly through the same allowedTools pattern.

I'm still in the exploration phase with this, and the API is moving - there's already a V2 session-based interface alongside the generator pattern in 0.2.x. But the subagent model and session persistence are solid enough to build on, and they solve the specific problems I was running into cleanly enough that I'll be using this properly rather than rolling my own agent loop.

Need help with your business?

Enjoyed this post? I help companies navigate AI implementation, fintech architecture, and technical strategy. Whether you're scaling engineering teams or building AI-powered products, I'd love to discuss your challenges.

Learn more about how I can support you.

The Claude Agent SDK: What It Is and Why It's Worth Understanding

What it is

Subagent orchestration

Session persistence

The trade-offs

Need help with your business?

I Built a Cost Tracker for Claude Code to See If My Subscription Was Worth It

Get practical insights weekly

The Claude Agent SDK: What It Is and Why It's Worth Understanding

What it is

Subagent orchestration

Session persistence

The trade-offs

Need help with your business?

I Built a Cost Tracker for Claude Code to See If My Subscription Was Worth It

You might also like

I Built a Cost Tracker for Claude Code to See If My Subscription Was Worth It

Claude Code Pricing Guide: Which Plan Actually Saves You Money

OpenClaw: How I Built a Personal AI Operations Centre with Telegram and a $6 Droplet

Get practical insights weekly