A Claude Code system that turns pre-tagged customer call data in Notion into decision-ready insights for the next pipeline stage.
Where this system sits in the broader workflow.
We're an early-stage venture-backed startup validating a product hypothesis through customer conversations. We will conduct roughly one hundred calls. The raw transcripts and summaries live in Notion, alongside a taxonomy of themes and snippets that we extract via Notion's custom agents before Claude Code ever touches the data.
This system is the second stage in a pipeline. It reads the already-tagged data from Notion, synthesizes patterns, pain, and a working ICP definition, and produces two artifacts: a human-readable HTML report for the team and a structured JSON bundle for a downstream Claude Code session that generates requirements, PRDs, and tickets.
The pipeline's economics depend on an upstream choice: tagging happens in Notion, not in Claude Code. Agents reason over compact metadata — not raw transcripts — which is what makes the whole system affordable to run at volume.
Three decision-forcing deliverables, produced together in a single session.
What the data says that we didn't go looking for. Themes clustering in unexpected ways, recurring situations that don't fit our current mental model. The system operates with zero priors from us — no seed topics, no guiding questions. This is the output that changes our minds rather than validates them.
What hurts, how much, and for whom. Distinguishes pains that are annoying from pains that are costly from pains that are urgent. Surfaces how the same pain lands differently across different customer situations, so we don't build for the loudest voice rather than the truest signal.
A situation-based profile of who feels the pain most acutely. Not demographics — a set of conditions that predict high intensity. Tells us who to call next and, eventually, who to build for first.
A thin orchestrator coordinating a retriever, an extensible bank of analysts, and a composer.
Capability specialists, not outcome specialists. Each has one responsibility and one output shape.
Encoded expertise loaded into each agent's context at runtime. The orchestrator never reads skill contents — it routes by manifest only.
CLAUDE.md.The capabilities each agent actually invokes. Kept small on purpose — fewer tools means fewer failure modes.
jq — without loading the whole bundle into model context. This is a meaningful token saver.Six architectural choices worth making explicit. All are v1 decisions we can revisit.
Agent teams run roughly 7x the token cost of subagents at comparable task complexity. At our stage the economics decide it — subagents first, reassess after we've evaluated output quality across several sessions.
No retry-on-failure loops, no feedback cycles, no iterative retrieval in v1. Cheaper to run, easier to evaluate, and we don't yet know where loops would add real signal versus just burning tokens.
The orchestrator routes from a manifest and never reads skill or data contents itself. Skills are referenced by file path; data bundles live on disk. Keeps the orchestrator's context under 3k tokens.
The three analysts are not a fixed set. Adding a Competitive analyst, Churn analyst, or Expansion analyst later is a skill-plus-subagent addition — no change to the orchestration flow.
All theme and snippet extraction happens in Notion before Claude Code runs. Agents reason over compact structured metadata, not transcripts. Without this, the system is ten times more expensive and much slower.
Subagents-not-teams and linear-not-recursive are starting positions. Revisit after we've seen output at real call volume and have concrete quality complaints — not before.
Things we haven't resolved yet. To be worked through.