Acme SupportSupport TriageDemo-safe data

Support triage comparison

One prompt tested against 3 models across 5 labeled support tickets.

Winner

Claude Opus 4.7 passed 5/5 cases

It was the only model to classify the revenue-impact outage as urgent while still returning the exact single-word format required by the prompt.

Baseline improved from 80% to 100% (+20 points)
Run status
completed
5/5
cases complete
queued
running
completed
Claude Opus 4.7
Anthropic · 822ms avg
Winner
100%pass rate

Passed every support triage case.

GPT-5.4
OpenAI · 694ms avg
Strong
80%pass rate

Misclassified an outage with revenue impact as technical instead of urgent.

Gemini 3 Pro
Google · 512ms avg
Needs tuning
60%pass rate

Missed urgency signal in a production outage and revenue-loss case. Returned prose instead of the required single lowercase category.

Test case results

Rows fill as each case completes. Failed cells show the exact reason.

demo-support-triage
Test caseClaude Opus 4.7GPT-5.4Gemini 3 Pro
Expected: technical
I can't login to my account, it says my password is wrong.
technical
842ms
technical
702ms
technical
522ms
Expected: billing
My credit card was charged twice this month.
billing
796ms
billing
689ms
billing
501ms
Expected: urgent
The checkout page is down and we are losing orders every minute.
urgent
911ms
technical

Misclassified an outage with revenue impact as technical instead of urgent.

734ms
technical

Missed urgency signal in a production outage and revenue-loss case.

548ms
Expected: general
How do I change my email notification preferences?
general
756ms
general
651ms
general
477ms
Expected: billing
Can you send me the invoice for last month's subscription?
billing
804ms
billing
694ms
The message is about billing.

Returned prose instead of the required single lowercase category.

512ms