Grounding validation for tool-calling AI agents and RAG pipelines.
Deterministic rules. No LLM judge. Free tier available.
The problem
Your agent calls a database and gets balance: $2,450 but tells the user
$2,540 — that's a grounding failure.
The correct answer was already in the trace.
{
"name": "Emily Carter",
"balance": 2450,
"status": "active",
"department": "D01"
}
"Emily's balance is $2,540 and her account is active."
How it works
Drop in a few lines of code, get a verdict in milliseconds
Send the agent trace — tool calls, retrieved context, and the final response.
50+ deterministic rules check every claim against the ground truth. No LLM. Zero API calls.
Get a pass/fail verdict with exact failure reasons. Block, warn, or log — you choose.
import { StozerClient, TraceBuilder } from 'stozer-ai';
const stozer = new StozerClient({ apiKey: 'stozer_xxx' });
// Build the trace as your agent runs
const trace = new TraceBuilder()
.addToolCall('getUser', { userId: 'U-42' })
.addToolOutput('getUser', {
name: 'Emily Carter',
balance: 2450,
status: 'active'
})
.addFinalResponse(
"Emily's balance is $2,450 and her account is active."
)
.build();
// One call — get the verdict
const { report } = await stozer.evaluate(trace);
console.log(report.groundingScore); // 1.0 — all claims grounded ✓
Detection
Every claim in the response is extracted, matched to source data, and verified — without an LLM judge.
Prices, dates, quantities, percentages — any number that drifts from the source data.
Wrong name, wrong company, wrong product. Cross-contamination between records.
Statements with no basis in retrieved context. Fabricated policies, invented features.
Order marked "shipped" when it's "processing". Account shown "active" when suspended.
Omitted disclaimers, dropped conditions, ignored caveats from the source material.
Outdated information presented as current. Wrong dates, expired offers, stale data.
Why Stozer
On 30–70% of traces (depending on structure), Stozer closes the verdict before any LLM is called. When an LLM is needed, it gets a focused batch call — pre-filtered claims with verified anchors, not an open-ended judgment.
| LLM-as-a-Judge | Stozer | |
|---|---|---|
| Structured data traces | Full LLM call every time | Deterministic — zero LLM cost |
| Ambiguous edge cases | Full LLM call every time | One focused batch call |
| Verdict reliability | Non-deterministic | Deterministic where provable |
| Hallucination risk | Judge can hallucinate | Only where evidence is ambiguous |
| Explainability | Black box score | Exact failure code + evidence |
| Scalability | Rate-limited by provider | 10K+ evals/sec |
Adoption
Five modes let you adopt incrementally.
Explore historical traces
Fail builds on regressions
Silent production monitor
Alert teams on failures
Stop bad responses
Pre-configured rules for domain-specific grounding — medical records, financial transactions, legal entities, and more.
11 languages — EN, SR, ES, FR, PT, DE, IT, RU, HI, AR, BN
Benchmarks
Reproducible results on public and production datasets.
Read the full benchmark report16,662 question-answer pairs. Near-perfect detection of fabricated answers.
750 expert-annotated LLM summaries. Harder task — free-form text.
Real customer-support agent traces. 50 rules, 4 failure categories.
All benchmarks reproducible. npm package available.
Stozer is in early access. The npm package is live. The hosted platform is coming soon.
Our team will reach out to you shortly.
Or start now: npm install stozer-ai