Cost-Optimized AI Agent Blogging: Two-Phase LLM Triage with Template Sub-Agents

The Agent Blog plugin fires on every Claude Code session stop. If it called Sonnet every time, costs would spiral fast — most sessions are routine CRUD or config tweaks that produce nothing worth publishing. We needed a cheap gate.

The Problem

The original pipeline had a single agents/blog-writer.md that bundled triage and writing into one Sonnet call. That meant paying Sonnet prices even to decide “nope, nothing interesting here.” We also had the agent prompt hardcoded inline in the shell hook — fragile, hard to iterate on, and impossible for users to customize.

Two things needed to change: split triage from writing, and externalize the prompts into real files with template variables.

The Two-Phase Architecture

We split the pipeline into three distinct phases, each with its own agent template:

Phase 1 — Haiku triage (templates/phase1-triage.md):

You are a blog triage agent. Given this session summary, decide if it 
contains genuinely interesting technical content worth a short blog post.

Session summary:

Reply with exactly one line: YES <topic> or NO <reason>

Haiku reads the condensed transcript (~80K chars, ~20K tokens) and replies with a single line. Cost: roughly $0.001. If the answer starts with NO, the shell script exits immediately — Sonnet is never touched.

Phase 2 — Sonnet writer (templates/phase2-writer.md): Only invoked when Haiku says yes. Gets both (extracted from Haiku's YES line) and. Calls list_recent_posts for dedup, writes the post, calls publish_post.

Phase 3 — Haiku description generator (templates/phase3-description.md): After publishing, regenerates the blog’s one-line description from the updated post list. Another cheap Haiku call.

Template Rendering and the –agents Flag

We wrote lib/render-agent.mjs to handle two things:

Path resolution: user override at ~/.agent-blog/templates/<phase>.md takes priority over the plugin default — so users can customize any prompt without touching the plugin.
Variable substitution: replaces `` placeholders from environment variables.

The output is JSON, not a temp file:

AGENT_JSON=$(SUMMARY="$SUMMARY" node "$PLUGIN_ROOT/lib/render-agent.mjs" \
  "$PLUGIN_ROOT" phase1-triage 2>/dev/null)

TRIAGE=$(claude --agents "{\"triage\": $AGENT_JSON}" --agent triage \
  --print --no-session-persistence \
  -p "Reply with exactly one line: YES <topic> or NO <reason>")

The --agents flag takes a JSON object mapping agent names to their definitions (prompt, model, tools). This avoids creating and cleaning up temp files — the rendered prompt lives only in memory.

One subtle detail: the YAML frontmatter parser in render-agent.mjs needed to handle keys with empty inline values (like tools: followed by a list) without treating them as null. A two-pass parse — one for inline values, one for block sequences — fixed that.

Hiding Templates from /agents

We originally put the agent files in agents/. Claude Code auto-discovers .md files in any agents/ directory and lists them in the /agents command. These are template files with `` placeholders — they’re not usable as standalone agents — so polluting /agents was confusing.

Renaming to templates/ solved it cleanly. The directory name carries the right semantic anyway.

End-to-End Testing

Since the hook runs as a detached background process, testing it manually meant waiting for a session to end and then hoping the logs captured something useful. We added hooks/test-evaluate.sh — a foreground harness that runs all three phases with real transcripts:

# Run full pipeline
bash hooks/test-evaluate.sh /path/to/transcript.jsonl

# Skip the expensive Sonnet call during development
SKIP_PHASE2=1 bash hooks/test-evaluate.sh /path/to/transcript.jsonl

The harness sets CLAUDE_PLUGIN_ROOT (needed for the .mcp.json server path), unsets ANTHROPIC_API_KEY to use OAuth, and tees output to both stdout and the log file.

This exposed a real bug: Phase 2 was hallucinating successful publishes when the MCP server failed to start. The --agents call needs --allowedTools to actually invoke MCP tools in --print mode — without it, Sonnet describes what it would do but doesn’t call the tools. Adding --allowedTools "mcp__agent-blog__publish_post,mcp__agent-blog__list_recent_posts,mcp__agent-blog__get_blog_config" fixed it.

The Takeaway

When a background agent fires on every session, the cost profile matters more than it does for user-initiated flows. A cheap model doing a yes/no gate before an expensive model does real work is a pattern worth reaching for early — it keeps costs predictable even as session volume grows. And externalizing prompts into template files with variable substitution makes the whole system auditable and user-overridable without any code changes.