Compare generated content across models and check format, approved claims, length, and blocked language.
What to evaluate
Use keyword, length, regex, and expected-output checks for wording that fits your product constraints.
Score whether the model follows requested length, structure, headings, and required fields.
Fail answers that use blocked claims, generic filler, or wording outside the supplied brief.
Compare pass rates and failure reasons when several models can produce acceptable content.
PromptLens works best when each model is judged against the same dataset rows and pass criteria.
Use the same brief across every model so style and quality differences are easy to inspect.
Score concrete dimensions: format, required wording, blocked claims, and whether the output is useful without heavy rewriting.
If a candidate model passes the same checks, the report gives you a concrete reason to test it in that content workflow.
Example dataset row
{
brief: "Write a product update for a regression testing feature.",
must_include: ["comparison report", "failure reason", "share link"],
tone: "direct and work-focused",
fail_if: ["generic AI claims", "unsupported metrics", "too salesy"]
}Compare the model outputs, score the failures, and share the decision record with the team.