Open research preview

Proof,
by council.

One model is an opinion. Four frontier models, forced to disagree on the record, is evidence. Pramana is the deliberation engine for serious research.

The council
OpenAI
GPT-5.2
Reasoning
Anthropic
Opus 4.5
Long-context
Google
Gemini 3.1
Multimodal
xAI
Grok 4
Realtime
The thesis · 01

A single model is a confident stranger.
Four models, forced to disagree on the record, is the closest thing software has to peer review.

4
Frontier models per question
Critique rounds, until consensus or escalation
100%
Citations or it didn't happen
The mechanism · 02

Four phases. One verdict. No hand-waving.

Every question runs the same loop: route, draft, critique, audit — with up to three revision rounds. The entire trail is preserved, so a reviewer can replay how the council got from your prompt to its answer.

  1. 01

    Route

    A fast classifier scores the prompt across six axes — factual, mathematical, code, long-context, multimodal, opinion — and picks the lead by axis-weighted skill. It then seats three critics from competing labs so no two voices share weights or training data.

    → lead_id, critic_seats[3], axis_scores
  2. 02

    Draft

    The lead writes the first answer with mandatory inline citations — no source, no claim. On revision rounds it must address every severity-3 critique by direct quote, not paraphrase, so reviewers can see what changed and why.

    → draft.md, citations[], revision_diff
  3. 03

    Critique

    All four models read the draft in parallel and return findings tagged factual, missing-assumption, logic-gap, or citation-weak — each with severity 1–3. Any severity-3 finding forces another revision round, up to three.

    → findings[], severity, revise?
  4. 04

    Audit

    An independent second pass: each critic, blind to the others' verdicts, returns APPROVE, REVISE, or REJECT with a confidence score. The verdict is the council vote — never the lead's own opinion of its own work.

    → verdict, confidence, dissent[]
What you actually get back
Consensus

All four critics approve. Verdict ships with confidence ≥ 0.85 and the full audit log attached.

Split

Majority approves, minority dissents. Answer ships flagged, with the dissenting reasoning preserved in the trail.

Escalate

After three rounds without consensus, the question returns INCONCLUSIVE. The disagreement is the answer.

Most tools optimize for sounding sure. Pramana is built to refuse to answer when the council can't agree — because in research, an honest "we don't know yet" is worth more than a confident lie.

The seats · 03

Vendor diversity is not a feature. It's the design.

Every round seats models from competing labs. They don't share weights, training data, or incentives. They cannot collude their way to a wrong answer.

OpenAI
GPT-5.2
Lead · Critic
Anthropic
Claude Opus 4.5
Lead · Critic
Google
Gemini 3.1 Pro
Lead · Critic
xAI
Grok 4
Critic
Built for · 04

The questions worth arguing about.

Pramana is overkill for trivia. It's the right amount for the questions you'd otherwise pay an analyst to answer.

Literature review

“Synthesise the last 18 months of papers on RLHF reward hacking.”
Cited markdown · footnoted PDF

Document due diligence

“Read this 80-page S-1. What did underwriters bury in section 4?”
Risk memo · DOCX export

Contested claims

“Is the GLP-1 cardiovascular benefit independent of weight loss?”
Verdict card · 3 audit signatures
Receipts · 05

Every answer arrives with its paper trail.

No black box. You see the draft, every critique, the lead's revisions, and three independent audit verdicts before the answer is signed.

Verdict · sample
✓ Approved · 0.92
Question
Did the 2024 GLP-1 trials show cardiovascular benefit independent of weight loss?
Final answer · excerpt

The SELECT trial[1] showed a 20% reduction in major adverse cardiovascular events with semaglutide, and a pre-specified mediation analysis[2] attributed only ~30% of that benefit to weight change — implying a substantial weight-independent effect. The council flags one minority view: STEP-HFpEF[3] outcomes were dominated by symptom scores, not hard endpoints…

GPT-5.2
Approveconf 0.94
Claude Opus 4.5
Approveconf 0.91
Gemini 3.1 Pro
Reviseconf 0.78
[1] NEJM 2023;389:2221 · [2] Circulation 2024;149:e1 · [3] NEJM 2023;389:1069
Access · 06

Free during the
research preview.

We're calibrating the council with the analysts and researchers who'll use it most. No card, no quota games — just a soft cap while we tune.

No setup

Sign in, ask, get a verdict. No keys, no model picker, no prompt engineering.

Your data, your call

We don't train on your inputs. Documents are processed in-session and discarded.

Export everything

Markdown, DOCX, and a forensic PDF report with every critique and verdict.

Open the councilFree during preview · soft cap on heavy days
Questions · 07

What you'll want to know.

Which models actually sit on the council?

Currently GPT-5.2, Claude Opus 4.5, Gemini 3.1 Pro, and Grok 4 — one per major lab. The router picks the strongest lead per question; the rest critique. The roster updates as new frontier models ship.

Does this actually reduce hallucinations?

Independently. Vendor-diverse critics catch errors that any single family of models would confidently repeat. Hallucinations that survive four rounds of disagreement are rare — and when they happen, the audit verdicts flag them with low confidence rather than hiding them.

Do you train on my questions or documents?

No. Inputs are processed in-session and not retained for training by Pramana or our model providers (we use enterprise endpoints with training opt-out). Documents are dropped after the session.

What can I export?

Markdown for fast paste, DOCX for editing, and a forensic PDF report containing the question, every draft and critique, all three audit verdicts with confidence scores, and citations.

Why is it slower than a single model?

Because four models are working. A typical verdict takes 30–90 seconds depending on rounds. If you need an answer in two seconds, you don't need a council — you need a chatbot.

Why Sanskrit?

Pramāṇa (प्रमाण) is the classical Indian epistemology term for a valid means of knowledge — perception, inference, testimony, comparison. A council of independent witnesses is older than the internet by a few millennia.