One prompt tested against 3 models across 5 labeled support tickets.
It was the only model to classify the revenue-impact outage as urgent while still returning the exact single-word format required by the prompt.
Passed every support triage case.
Misclassified an outage with revenue impact as technical instead of urgent.
Missed urgency signal in a production outage and revenue-loss case. Returned prose instead of the required single lowercase category.
Rows fill as each case completes. Failed cells show the exact reason.
| Test case | Claude Opus 4.7 | GPT-5.4 | Gemini 3 Pro |
|---|---|---|---|
Expected: technical I can't login to my account, it says my password is wrong. | technical 842ms | technical 702ms | technical 522ms |
Expected: billing My credit card was charged twice this month. | billing 796ms | billing 689ms | billing 501ms |
Expected: urgent The checkout page is down and we are losing orders every minute. | urgent 911ms | technical Misclassified an outage with revenue impact as technical instead of urgent. 734ms | technical Missed urgency signal in a production outage and revenue-loss case. 548ms |
Expected: general How do I change my email notification preferences? | general 756ms | general 651ms | general 477ms |
Expected: billing Can you send me the invoice for last month's subscription? | billing 804ms | billing 694ms | The message is about billing. Returned prose instead of the required single lowercase category. 512ms |