Every day, your AI generates thousands of outputs. Reports, summaries, analyses — each one trusted by your team.

Benchmark Results

Validated on public benchmarks

Tested on 4 hallucination detection benchmarks across medical, financial, and general domains.

AUROC on held-out test sets

False Positive RateTrue Positive Rate0.00.00.50.51.01.0

Token-Level Scoring

Every token. Every output. Scored.

PraetorUQ doesn't just flag entire outputs. It scores every token, so you see exactly where confidence drops.

AI-Generated Output — Token-Level Confidence

0.8+ Confident0.4–0.8 Uncertain<0.4 Likely hallucinated

Global semiconductor revenues are projected to exceed $600 billion by 2025, driven by sustained demand in data-centre and automotive segments.0.95

TSMC’s advanced 3nm process accounts for an increasing share of foundry revenue, with utilisation rates above 90% through Q3.0.93

Intel’s disaggregated chiplet roadmap is expected to restore competitive parity in server CPUs by late 2025, with initial yields reportedly tracking ahead of plan.0.59

The U.S. CHIPS Act has allocated roughly $52 billion in subsidies, of which $8.3 billion has been conditionally approved for facilities in Arizona and Ohio.0.44

Compatibility

Works with any LLM

No vendor lock-in. PraetorUQ is model-agnostic — it scores outputs from any language model provider.

OpenAIAnthropicGoogleMistralMeta LlamaCohereFine-tuned modelsAny OpenAI-compatible API

Integration

Score any LLM output

Wrap your client. Scores appear on every response.

from praetoruq import PraetorUQ
from openai import OpenAI

# Wrap your client — one-line change
client = PraetorUQ.wrap(OpenAI(), api_key="...")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Extract key metrics..."}],
)

# Scores live directly on the response — fully typed
response.choices[0].message.score       # 0.87
response.choices[0].message.spans
# [Span(text="Revenue grew 8%", score=0.93), ...]

Prefer not to wrap?

Score any response on demand — no client wrapping required.

from praetoruq import PraetorUQ

praetor = PraetorUQ(api_key="...")

# Score any response — no client wrapping needed
scored = await praetor.score(response)

scored.score          # 0.87
scored.spans
# [Span(text="Revenue grew 8%", score=0.93), ...]

Python & TypeScript SDKs, plus a REST API for any language. Coming soon — request early access.

Ready to filter the noise?