ASR intent detection with Claude

Post-ASR LLM intent classifier with structured output — lifted production accuracy from 41.7% to 91.7% without retraining.

ASR intent detection with Claude

ASR transcripts are noisy — homophones, half-words, environment noise. A small LLM running as a post-ASR classifier with constrained JSON output cleanly separates "what was said" from "what was meant." This is the exact pattern that took intent accuracy from 41.7% → 91.7% in my paper.

The classifier

import Anthropic from "@anthropic-ai/sdk";
import { z } from "zod";

const client = new Anthropic();

const Intent = z.object({
  intent: z.enum([
    "book_appointment",
    "cancel_appointment",
    "ask_pricing",
    "speak_to_human",
    "smalltalk",
    "unclear",
  ]),
  confidence: z.number().min(0).max(1),
  slots: z.record(z.string()).optional(),
  rationale: z.string().max(120),
});

const SYSTEM = `You classify user utterances from an ASR system into one of these intents:
- book_appointment, cancel_appointment, ask_pricing, speak_to_human, smalltalk, unclear

Rules:
- Output JSON only. Match the schema.
- "unclear" if confidence < 0.5 — DO NOT guess.
- Slots: extract date, time, name when present.
- Confidence reflects ASR uncertainty too (correct for obvious mishears).`;

export async function classify(transcript: string) {
  const res = await client.messages.create({
    model: "claude-haiku-4-5-20251001",
    max_tokens: 256,
    system: [{ type: "text", text: SYSTEM, cache_control: { type: "ephemeral" } }],
    messages: [
      { role: "user", content: `Transcript: "${transcript}"\n\nReturn JSON.` },
    ],
  });

  const text = res.content
    .filter((b): b is Anthropic.TextBlock => b.type === "text")
    .map((b) => b.text)
    .join("");

  return Intent.parse(JSON.parse(text));
}

Why this works

  • Haiku, not Opus. This is a routing call — speed matters more than reasoning depth. Median ~180 ms TTFT.
  • Cached system prompt. The full taxonomy + rules sit in cache; per-call cost is just the transcript.
  • unclear as a first-class output. Forcing the model to admit uncertainty is what killed the false-positive rate. A hallucinated book_appointment is worse than asking "could you repeat that?"
  • Slot extraction in the same call. No need for a separate NER step — Haiku is competent at JSON-schema-conditioned extraction.

Eval harness

const cases: { transcript: string; expected: string }[] = JSON.parse(
  await Deno.readTextFile("./eval/golden.json")
);

let correct = 0;
for (const c of cases) {
  const out = await classify(c.transcript);
  if (out.intent === c.expected) correct++;
}
console.log(`accuracy: ${((correct / cases.length) * 100).toFixed(1)}%`);

Run this every time you touch the system prompt or change the model. Without an eval set, prompt edits are vibes-based.