WorkStorytelling engine

storytelling engine · community media pipeline with eval-graded quality gates
first shipped 2025 · last touched 2026-04-30

Problem

A community media platform wanted to scale long-form member-success stories without losing voice, fabricating details, or burying their editors under unstructured drafts. The bottleneck was not the writing — it was the grading and the gating.

Approach

A pipeline where every stage emits content and a structured eval. Voice fidelity, narrative clarity, fact grounding, sourcing tier — each a small grader returning a score plus a one-line rationale. Pieces below threshold get rewritten or sent back to editors with the failing criteria pre-highlighted. Critically: the system never publishes — the member sees and explicitly approves the draft before anything goes live. The evals are the product; the model is interchangeable.

Stack

TypeScript on Vercel, Anthropic for generation and grading, Supabase for state, intake through a form-builder webhook, output to Google Drive + the editor dashboard. The 8-point pass/fail + 5-point rubric lives in version control next to the prompts.

What shipped

In beta. Eleven member stories run end-to-end: voice fidelity averaging 4.7/5, all eleven landed concrete opening scenes, zero fabricated metrics across the batch. Editor green-lit pilot with four pre-launch gates locked in (intake validation, automated quantitative checks, member-approval workflow, press-tier sourcing protocol).

What’s next

Per-rubric calibration — surfacing where graders disagree with humans, and letting the team adjust thresholds without touching code.

the stack

  • ·TypeScript
  • ·Anthropic
  • ·eval harness
  • ·Postgres
  • ·queue worker

Source is private (client work). The panel can answer questions about this project's stack and outcomes without leaking client identity.