Compare generated SQL text before execution, then check expected query fragments and forbidden operations.
What to evaluate
Include the relevant schema in each row so model outputs can be reviewed against the same table and column context.
Use regex checks to fail generated queries that contain destructive operations or ignore explicit safety instructions.
Check whether the query text references required tables, filters, joins, and aggregation fragments.
Compare a new SQL prompt against a known-good baseline before reviewers allow it into an agent flow.
PromptLens works best when each model is judged against the same dataset rows and pass criteria.
Use PromptLens to compare the SQL text and reasoning, not to execute production database queries.
The report helps reviewers see which model followed the supplied schema context and which one produced unsafe or unexpected query text.
A candidate model is only viable when it keeps the same text-safety and expected-fragment profile as your baseline.
Example dataset row
{
request: "Show revenue by month for paid accounts.",
schema_context: "accounts(id, plan), invoices(account_id, paid_at, amount)",
must_include: ["paid_at", "amount", "GROUP BY month"],
fail_if: ["DELETE", "DROP", "unknown table"]
}Compare the model outputs, score the failures, and share the decision record with the team.