db
stxt / internal-tools / dashboard-builder

Dashboard Builder

A multi-agent system that turns a Slack message or Linear task into a shipped Metabase dashboard on BigQuery — with quality gates, post-build audit, and a chat layer for corrections.

Specialized agents
8
Wall clock target
~5min
46 min before optimization
Models in flight
2
Haiku 4.5 + Sonnet 4.6
User input
1
Slack mention
How it works

From one Slack mention to a shipped dashboard

flowchart LR U([User
Slack message
+ optional
Linear URL]) --> Brief subgraph PIPE [Build pipeline] direction LR Brief[mb-brief
Haiku 4.5
Intent → brief.yaml] --> Disc[Discovery
deterministic
backend, no LLM] Disc --> Arch[mb-architect
Sonnet 4.6
Layout, charts] Arch --> Sql[mb-sql
Sonnet 4.6
SQL + previews] Sql --> CrSql[mb-critic SQL
Sonnet 4.6
Quality gate] CrSql --> Build[mb-builder
Sonnet 4.6
Create cards in
Metabase REST] end Build --> Audit[Post-build audit
Sonnet 4.6 vision
Data + viz settings] Audit --> Final([Delivery
Slack thread
+ chat mode]) Final --> Chat[mb-chat
Sonnet 4.6
Q&A and
modifications] classDef agent fill:#faf5ff,stroke:#6b21a8,stroke-width:1px,color:#18181b; classDef code fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#18181b; classDef io fill:#f4f4f5,stroke:#52525b,stroke-width:1px,color:#18181b; class Brief,Arch,Sql,CrSql,Build,Audit,Chat agent; class Disc code; class U,Final io;

Each box is its own Anthropic Managed Agent session. Discovery was an agent until we measured it taking 7–10 min per run; replacing it with a deterministic backend call cut that to ~2 seconds without sacrificing the output the architect needs.

The cast

8 specialized agents, one job each

mb-brief
Haiku 4.5

Turns natural-language requests into structured brief.yaml. Reads Linear tasks directly when a URL is mentioned.

read-only Linear API ~30s
discovery
backend code

Reads Metabase metadata, ranks tables by keyword match against the brief, derives partition columns + field IDs. No LLM — pure deterministic lookup.

cached 6h Metabase /metadata ~2s
mb-architect
Sonnet 4.6

Designs the dashboard layout — chart types, grid placement, filters. Pure design pass; no SQL, no API calls. Outputs design.md.

read-only visualization principles ~3 min
mb-sql
Sonnet 4.6

Writes BigQuery SQL for every card with mandatory partition pruning + {{date_range}} template-tags. Validates all 9 previews in parallel via /api/dataset.

preview-only ≤ 5 GB scan / card ~4–6 min
mb-critic
Sonnet 4.6

Quality gate with three modes: SQL (reviews queries pre-ship), visual (post-build data audit catching axis-clamp, empty-card, wrong-chart-type bugs), audit (existing dashboard review).

read-only verdict: APPROVED / CHANGES / BLOCKED
mb-builder
Sonnet 4.6

The only mutating agent. Creates cards + the dashboard via Metabase REST, in parallel; wires the date_range filter via parameter_mappings; verifies after create.

writes Metabase parallel POST ~3 min
mb-chat
Sonnet 4.6

Long-lived conversational layer after delivery. Three jobs in priority order:

  1. Capture corrections as permanent knowledge — wrong metric definition, wrong table, taste rule → POSTed to /api/admin/append-learning, baked into every future agent on next deploy.
  2. Apply changes — "change lookback to 30 days", "add a card for paid users", "archive card 3" → direct Metabase PUT/POST.
  3. Answer questions about the data, the cards, the metrics.
writes Metabase writes learnings scoped HMAC token long-lived

Anthropic Managed Agents + Vercel + Slack + Metabase

flowchart TB subgraph Frontline [Frontline] SL[Slack thread
events, replies,
reactions] LIN[Linear API
task descriptions] end subgraph Vercel [Vercel Functions] SLACK_API[/api/slack
events handler] ORCH[lib/orchestrator.ts
state machine] WEBHOOK[/api/anthropic-webhook
session-idle handler] DISC[lib/discovery.ts
deterministic discovery] AUDIT[lib/screenshot.ts
per-card data audit] end subgraph KV [Vercel KV / Upstash Redis] STATE[(run:<id>
RunState + phaseTimings)] ART[(run:<id>:artifact:<name>
brief.yaml, design.md, cards/*)] LEARN[(state:learnings.md
baked into next deploy)] META[(metabase:metadata
6h TTL cache)] end subgraph Anthropic [Anthropic Managed Agents] BRIEF((mb-brief)) ARCH((mb-architect)) SQL((mb-sql)) CRIT((mb-critic)) BLD((mb-builder)) CHAT((mb-chat)) end subgraph Data [Data layer] MB[Metabase
REST API
via CF Access] BQ[(BigQuery
via Metabase
as proxy)] end SL <--> SLACK_API SLACK_API --> ORCH ORCH --> KV ORCH --> Anthropic ORCH --> DISC ORCH --> AUDIT WEBHOOK --> ORCH Anthropic --> WEBHOOK Anthropic --> MB Anthropic --> LIN MB --> BQ DISC --> META AUDIT --> MB CHAT --> LEARN classDef vercel fill:#fef3c7,stroke:#92400e,color:#18181b; classDef kv fill:#f0fdf4,stroke:#166534,color:#18181b; classDef agent fill:#faf5ff,stroke:#6b21a8,color:#18181b; classDef ext fill:#eff6ff,stroke:#1d4ed8,color:#18181b; class SLACK_API,ORCH,WEBHOOK,DISC,AUDIT vercel; class STATE,ART,LEARN,META kv; class BRIEF,ARCH,SQL,CRIT,BLD,CHAT agent; class SL,LIN,MB,BQ ext;
Vercel Functions

Stateless functions for the Slack webhook, Anthropic webhook, orchestrator state machine, and admin/cron endpoints. waitUntil() keeps work alive past 200 OK.

Upstash Redis (KV)

Single source of truth for run state, artifacts (the files agents pass between each other), captured learnings, and cached Metabase metadata.

Managed Agents

Each agent is its own Anthropic session with a baked system prompt (preamble + business context + metrics + learnings). They get bootstrapped with run-specific env vars at start.

A typical run, step by step

Sequence of events

sequenceDiagram autonumber participant U as User
(Slack) participant ORCH as Orchestrator participant KV as Vercel KV participant AGENT as Managed Agent participant MB as Metabase API U->>ORCH: @bot build dashboard / Linear URL ORCH->>KV: create run, save brief ORCH->>AGENT: start mb-brief AGENT-->>ORCH: brief.yaml + NEXT marker Note over ORCH: deterministic discovery
(no LLM, ~2s) ORCH->>MB: GET /database/$ID/metadata (cached) ORCH->>KV: save data-model.md ORCH->>AGENT: start mb-architect AGENT-->>ORCH: design.md ORCH->>AGENT: start mb-sql AGENT->>MB: parallel preview validation (9× /dataset) AGENT-->>ORCH: cards/*.{sql,card.json} ORCH->>AGENT: start mb-critic (SQL mode) AGENT-->>ORCH: APPROVED → ship ORCH->>AGENT: start mb-builder AGENT->>MB: parallel POST 9× /card AGENT->>MB: POST /dashboard with parameters AGENT->>MB: PUT /dashboard with parameter_mappings AGENT-->>ORCH: metabase_ids.json, build-summary.md Note over ORCH: audit BEFORE delivery ORCH->>MB: fetch each card's data + viz_settings ORCH->>AGENT: start mb-critic (visual mode) AGENT-->>ORCH: visual-review.md + verdict ORCH->>U: ✅ delivery with URL + audit verdict Note over U,AGENT: chat mode opens — feedback persists U->>ORCH: "wrong metric — net_revenue is X" ORCH->>AGENT: route to mb-chat AGENT->>ORCH: POST /api/admin/append-learning AGENT-->>U: ":brain: saved as permanent knowledge"

From 46 minutes to ~5 minutes — what moved the needle

Phase 0 — instrumentation

Added per-agent and per-transition timing to RunState. /api/admin/run-timings now surfaces a table of where every run actually spends its wall clock.

Stopped guessing about latency.

Deterministic discovery

Replaced the mb-discovery LLM agent with backend code that reads Metabase metadata (cached 6h), ranks tables by keyword match, and writes data-model.md directly.

10 min2 sec

SQL retry loop dropped

SQL critic's CHANGES_REQUESTED used to send the run back to mb-sql for another full pass (~7 min). Now we ship best-effort; the post-build audit catches anything real.

+7 min0 min

Audit before delivery

mb-builder used to post "Done!" to Slack before the visual critic even ran. Now the orchestrator composes the delivery message AFTER the audit, with the verdict inline. User sees issues before they open the dashboard.

UX correctness, not raw speed.

Parallel SQL preview + card create

Both mb-sql (validating 9 SQL previews against BigQuery) and mb-builder (creating 9 cards) now fan out with curl & + wait instead of looping sequentially.

~90s + 25s~15s + 5s

Smart polling fallback

When Anthropic webhooks are flowing, we skip the polling fallback entirely; when they lag, we poll with exponential backoff starting at 1.5s instead of a flat 4s.

13s overhead3s overhead

Beyond just building

Features that make this not just a one-shot generator

🔗

Linear integration

Paste a Linear URL or just say GTM-123 — mb-brief fetches the title + description + comments via GraphQL and uses them as the authoritative spec. No need to re-paste requirements every time.

🧠

Continuous learning

When the user corrects mb-chat ("net_revenue isn't computed that way"), it writes the correction to KV. On the next agent deploy, every agent gets that knowledge baked into its system prompt. Mistakes are corrected once.

⚙️

Audit-first delivery

After the build, we fetch every card's live query data + viz settings, hand it to mb-critic in visual mode, and only THEN compose the delivery message — with the audit verdict inline. The user sees issues before they open the dashboard.

🎚️

Working date filter

Every card's SQL uses a {{date_range}} template-tag bound to its partition column; the dashboard declares the matching parameter; each dashcard's parameter_mappings wires the two together. The filter actually filters.

🗂️

Category collections

mb-brief classifies each dashboard into one of 8 categories (acquisition, retention, onboarding, funnel, revenue, leadership, investors, other). mb-builder places it into the right Metabase folder automatically.

🧹

Auto-expire stale runs

A cron sweeps every 15 minutes and expires runs that have been idle >30 min in chat or waiting-user state. Sessions are archived to free Anthropic resources without losing the transcript.

🩺

Self-testing

Hourly health check pings /api/health; daily synthetic run kicks off a "test dashboard" request and validates the pipeline starts cleanly. Anomalies post to Slack only on state change — no spam.

🔐

Scoped tokens

mb-chat gets an HMAC token narrowly scoped to append-learning, 24h TTL — not the master deploy secret. Compromise of a chat session log doesn't leak admin access.

What's wired together

LLMs
Anthropic Claude
Managed Agents · Haiku 4.5 · Sonnet 4.6
Runtime
Vercel Functions
Next.js 15 App Router · Node.js
State
Upstash Redis (KV)
Run state, artifacts, learnings, cache
Chat surface
Slack
Events API · files.upload · reactions
BI layer
Metabase
REST API · self-hosted · Cloudflare Access
Data
BigQuery
Via Metabase as proxy (no direct creds in agents)
Tasks
Linear
GraphQL — fetched on demand by mb-brief
Scheduling
Vercel Cron
Stale cleanup · health checks · synthetic tests
Code organization
  • apps/web/lib/orchestrator.ts — state machine, agent dispatch, KV persistence
  • apps/web/lib/discovery.ts — deterministic schema discovery
  • apps/web/lib/screenshot.ts — post-build audit
  • apps/web/lib/anthropic.ts — Managed Agents SDK wrapper
  • apps/web/lib/slack.ts — Slack helpers (messages, uploads, reactions)
  • apps/web/_resources/agents/*.md — agent system prompts
Admin & diagnostics
  • /api/admin/deploy-agents — bake context + push 7 agents to Anthropic
  • /api/admin/list-runs — list recent runs with wall clock
  • /api/admin/run-timings — per-agent + transition timing for a run
  • /api/admin/inspect-run — full run state for debugging
  • /api/admin/append-learning — store user correction
  • /api/admin/screenshot — manual audit trigger
  • /api/admin/test-linear — verify Linear API wiring
Roadmap

What's next

Tighten the loop

Get a real-world dashboard build to consistently under 5 min by trimming the SQL agent's output volume and the architect's prose tendencies. Measured via /api/admin/run-timings.

Fix-loop after audit

When the visual critic returns CHANGES_REQUESTED with concrete viz_settings fixes (axis clamp, wrong title), route back to mb-builder ONCE to apply them before delivery — no human in the loop.

Slug-aware run IDs

Today two runs with the same Linear task overwrite each other's runId. Append a short hash so re-runs accumulate in the index for comparison.