Working document · v1 · Internal

Customer insights
pipeline

A Claude Code system that turns pre-tagged customer call data in Notion into decision-ready insights for the next pipeline stage.

Stage · Pre-product, hypothesis-gathering Team · 5 Target volume · ~100 calls
01

Context

Where this system sits in the broader workflow.

We're an early-stage venture-backed startup validating a product hypothesis through customer conversations. We will conduct roughly one hundred calls. The raw transcripts and summaries live in Notion, alongside a taxonomy of themes and snippets that we extract via Notion's custom agents before Claude Code ever touches the data.

This system is the second stage in a pipeline. It reads the already-tagged data from Notion, synthesizes patterns, pain, and a working ICP definition, and produces two artifacts: a human-readable HTML report for the team and a structured JSON bundle for a downstream Claude Code session that generates requirements, PRDs, and tickets.

The pipeline's economics depend on an upstream choice: tagging happens in Notion, not in Claude Code. Agents reason over compact metadata — not raw transcripts — which is what makes the whole system affordable to run at volume.

02

Outcomes per session

Three decision-forcing deliverables, produced together in a single session.

Outcome 01

Emergent pattern synthesis

What the data says that we didn't go looking for. Themes clustering in unexpected ways, recurring situations that don't fit our current mental model. The system operates with zero priors from us — no seed topics, no guiding questions. This is the output that changes our minds rather than validates them.

Outcome 02

Pain intensity map

What hurts, how much, and for whom. Distinguishes pains that are annoying from pains that are costly from pains that are urgent. Surfaces how the same pain lands differently across different customer situations, so we don't build for the loudest voice rather than the truest signal.

Outcome 03

ICP crystallization

A situation-based profile of who feels the pain most acutely. Not demographics — a set of conditions that predict high intensity. Tells us who to call next and, eventually, who to build for first.

03

Architecture

A thin orchestrator coordinating a retriever, an extensible bank of analysts, and a composer.

Orchestrator Retriever Notion MCP parallel · extensible Pattern analyst Pain analyst ICP analyst Composer HTML artifact Human review JSON artifact Next session context next pipeline session
04

Subagents

Capability specialists, not outcome specialists. Each has one responsibility and one output shape.

Agent 01
Retriever
Data access
Does
Queries the relational Notion databases — calls, metadata, taxonomy — and joins the results. The only agent that talks to Notion directly.
Returns
A single normalized JSON bundle written to disk. All downstream agents read from this file by reference.
Agent 02
Pattern analyst
Analysis
Does
Scans the data bundle with zero priors — finds theme co-occurrences, frequency distributions, and unexpected clusters across calls.
Returns
A ranked list of emergent patterns, each with a signal-strength score, supporting call IDs, and the theme IDs that evidence it.
Agent 03
Pain analyst
Analysis
Does
Measures pain-specific signals — frequency, urgency markers, and how pain themes distribute across call metadata dimensions.
Returns
A ranked pain map with intensity scores, plus metadata correlations showing which customer situations are overrepresented at the top.
Agent 04
ICP analyst
Analysis
Does
Identifies what is situationally true about people who feel the pain most acutely. Reads pain rankings alongside the full metadata bundle.
Returns
A conditions-based ICP profile — a set of situational criteria that predict high pain intensity, not a demographic description.
Agent 05
Composer
Output
Does
Assembles the outputs from all three analysts into the final artifacts. Handles visualization, section ordering, and the downstream handoff contract.
Returns
Two files: an HTML report for human review and a structured JSON bundle for the next pipeline session.
05

Skills

Encoded expertise loaded into each agent's context at runtime. The orchestrator never reads skill contents — it routes by manifest only.

Skill 01
notion-schema
Foundational
Purpose
Single source of truth for our Notion structure. Changes to the schema touch exactly one file.
Contents
Database names, relation graph between calls/metadata/taxonomy, property names, ID conventions, join patterns.
Used by
Retriever
Skill 02
signal-definition
Analyst
Purpose
Defines what counts as signal versus noise at our call volume. Prevents the Pattern analyst from over- or under-reporting.
Contents
Frequency thresholds, co-occurrence criteria, cluster-shape heuristics, anti-patterns to suppress, what to flag as weak-but-real signal.
Used by
Pattern analyst
Skill 03
pain-taxonomy
Analyst
Purpose
Tells the Pain analyst how to distinguish urgency from annoyance, and how to weight frequency against intensity.
Contents
Language markers that indicate urgency, weighting rules between frequency and intensity, theme combinations that signal acuity, known false positives.
Used by
Pain analyst
Skill 04
icp-format
Analyst
Purpose
Keeps the ICP output as a conditions-based profile rather than a demographic description.
Contents
Examples of good vs bad ICP output, the situational-criteria schema, anti-patterns to avoid (e.g. "mid-sized financial services"), required fields.
Used by
ICP analyst
Skill 05
html-contract
Output
Purpose
Defines the structure and visual conventions for the human-readable artifact.
Contents
Section order, visualization type per insight (stacked area for trends, grouped bar for segments), tone rules, embedded-JSON placement, citation format.
Used by
Composer
Skill 06
pipeline-handoff
Output
Purpose
The contract for the JSON artifact that feeds the next pipeline session (PRDs and tickets).
Contents
JSON schema, required fields the downstream session depends on, naming conventions, what the next session expects to find and where.
Used by
Composer
Skill 07
session-playbook
Orchestration
Purpose
Orchestrator's playbook for running a session end to end. Lives in CLAUDE.md.
Contents
How to decompose a request, when to run single-call vs aggregate, sequencing rules, file-path conventions, failure handling, the output contract.
Used by
Orchestrator
Skill 08
skill-manifest
Orchestration
Purpose
A routing table, not a library. Maps skill names to file paths plus one sentence per skill. The orchestrator reads this to route, never the skill contents themselves.
Contents
For each skill: name, file path, one-line purpose, which agents need it. Small enough to fit in the orchestrator's context without bloating it.
Used by
Orchestrator
06

Tools

The capabilities each agent actually invokes. Kept small on purpose — fewer tools means fewer failure modes.

Tool 01
Notion MCP
External
Purpose
Read and write access to the Notion databases that hold calls, metadata, and taxonomy.
Used by
Retriever only — no other agent touches Notion
Tool 02
File tools
Built-in
Purpose
Read, Write, and Edit. Used to load skill files at session start, write the data bundle to disk, read the bundle from downstream agents, and produce the final artifacts.
Used by
All agents
Tool 03
Bash
Built-in
Purpose
Structured data wrangling on the JSON bundle — grouping, counting, filtering via jq — without loading the whole bundle into model context. This is a meaningful token saver.
Used by
Pattern analystPain analystICP analyst
Tool 04
Task
Built-in
Purpose
Spawns subagents with a scoped prompt. This is the core orchestration primitive — subagents run in their own context window, receive only the file paths they need, and return summaries to the orchestrator.
Used by
Orchestrator
Tool 05
Web search
Optional
Purpose
For competitive context when a call mentions a competitor or workaround tool by name. Used sparingly — not core to any outcome.
Used by
Pattern analystrarely
07

Design notes

Six architectural choices worth making explicit. All are v1 decisions we can revisit.

01

Subagent architecture over agent teams

Agent teams run roughly 7x the token cost of subagents at comparable task complexity. At our stage the economics decide it — subagents first, reassess after we've evaluated output quality across several sessions.

02

Linear flow over recursive loops

No retry-on-failure loops, no feedback cycles, no iterative retrieval in v1. Cheaper to run, easier to evaluate, and we don't yet know where loops would add real signal versus just burning tokens.

03

Thin orchestrator

The orchestrator routes from a manifest and never reads skill or data contents itself. Skills are referenced by file path; data bundles live on disk. Keeps the orchestrator's context under 3k tokens.

04

Analyst bank is extensible

The three analysts are not a fixed set. Adding a Competitive analyst, Churn analyst, or Expansion analyst later is a skill-plus-subagent addition — no change to the orchestration flow.

05

Upstream tagging is the economic foundation

All theme and snippet extraction happens in Notion before Claude Code runs. Agents reason over compact structured metadata, not transcripts. Without this, the system is ten times more expensive and much slower.

06

Both architectural choices are v1

Subagents-not-teams and linear-not-recursive are starting positions. Revisit after we've seen output at real call volume and have concrete quality complaints — not before.

08

Open questions

Things we haven't resolved yet. To be worked through.