ASR intent detection with Claude
Post-ASR LLM intent classifier with structured output — lifted production accuracy from 41.7% to 91.7% without retraining.
ASR transcripts are noisy — homophones, half-words, environment noise. A small LLM running as a post-ASR classifier with constrained JSON output cleanly separates "what was said" from "what was meant." This is the exact pattern that took intent accuracy from 41.7% → 91.7% in my paper.
The classifier
import Anthropic from "@anthropic-ai/sdk";
import { z } from "zod";
const client = new Anthropic();
const Intent = z.object({
intent: z.enum([
"book_appointment",
"cancel_appointment",
"ask_pricing",
"speak_to_human",
"smalltalk",
"unclear",
]),
confidence: z.number().min(0).max(1),
slots: z.record(z.string()).optional(),
rationale: z.string().max(120),
});
const SYSTEM = `You classify user utterances from an ASR system into one of these intents:
- book_appointment, cancel_appointment, ask_pricing, speak_to_human, smalltalk, unclear
Rules:
- Output JSON only. Match the schema.
- "unclear" if confidence < 0.5 — DO NOT guess.
- Slots: extract date, time, name when present.
- Confidence reflects ASR uncertainty too (correct for obvious mishears).`;
export async function classify(transcript: string) {
const res = await client.messages.create({
model: "claude-haiku-4-5-20251001",
max_tokens: 256,
system: [{ type: "text", text: SYSTEM, cache_control: { type: "ephemeral" } }],
messages: [
{ role: "user", content: `Transcript: "${transcript}"\n\nReturn JSON.` },
],
});
const text = res.content
.filter((b): b is Anthropic.TextBlock => b.type === "text")
.map((b) => b.text)
.join("");
return Intent.parse(JSON.parse(text));
}
Why this works
- Haiku, not Opus. This is a routing call — speed matters more than reasoning depth. Median ~180 ms TTFT.
- Cached system prompt. The full taxonomy + rules sit in cache; per-call cost is just the transcript.
unclearas a first-class output. Forcing the model to admit uncertainty is what killed the false-positive rate. A hallucinatedbook_appointmentis worse than asking "could you repeat that?"- Slot extraction in the same call. No need for a separate NER step — Haiku is competent at JSON-schema-conditioned extraction.
Eval harness
const cases: { transcript: string; expected: string }[] = JSON.parse(
await Deno.readTextFile("./eval/golden.json")
);
let correct = 0;
for (const c of cases) {
const out = await classify(c.transcript);
if (out.intent === c.expected) correct++;
}
console.log(`accuracy: ${((correct / cases.length) * 100).toFixed(1)}%`);
Run this every time you touch the system prompt or change the model. Without an eval set, prompt edits are vibes-based.