Problem
Judging a hackathon by hand doesn’t scale or stay consistent — dozens of entries across two tracks, each with a repo, a live demo, and a video, all needing the same rubric applied fairly and fast.
Approach
A CLI that takes the registration CSV and judges every entry end-to-end. For each one it routes to the right track rubric, clones the GitHub repo when there is one, drives the live URL in a real browser (screenshots, console and network errors, a few safe clicks), and turns the demo video into text — host captions first, an audio speech-to-text fallback when there are none. It hands that evidence to Codex, which scores each criterion against fixed band anchors. A broken submission is logged and skipped, never aborting the run.
Stack
A TypeScript/Node CLI on the OpenAI Codex SDK, Playwright for live-demo probing, csv-parse for tolerant CSV ingest, and yt-dlp + ffmpeg + speech-to-text for video transcription.
What shipped
Used to judge the OpenAI × AIC Business Hackathon in Detroit (June 2026): every registration-CSV entry scored against a two-track, 100-point rubric anchored on one question — would a real SMB operator pay for this, deploy it this week, and actually use it? Output is a per-submission report plus a ranked leaderboard.
What’s next
Reusable for any future hackathon by pointing it at a new registration CSV.