Open research preview

Proof,
by council.

One model is an opinion. Four frontier models, forced to disagree on the record, is evidence. Pramana is the deliberation engine for serious research.

Open the council See a sample verdict

The council·4 seats · 3 labs · 0 shared weights

OpenAI

GPT-5.2

Reasoning

Anthropic

Opus 4.5

Long-context

Google

Gemini 3.1

Multimodal

xAI

Grok 4

Realtime

The thesis · 01

A single model is a confident stranger.
Four models, forced to disagree on the record, is the closest thing software has to peer review.

Frontier models per question

3×

Critique rounds, until consensus or escalation

100%

Citations or it didn't happen

The mechanism · 02

Four phases. One verdict. No hand-waving.

Every question runs the same loop: route, draft, critique, audit — with up to three revision rounds. The entire trail is preserved, so a reviewer can replay how the council got from your prompt to its answer.

question
  │
  ▼
[ route ] ── lead + 3 critics
  │
  ▼
[ draft ] ◀──────┐
  │              │ revise (≤3×)
  ▼              │
[ critique ] ────┘
  │
  ▼
[ audit ] → APPROVE / REVISE / REJECT
  │
  ▼
verdict + paper trail

01
Route
A fast classifier scores the prompt across six axes — factual, mathematical, code, long-context, multimodal, opinion — and picks the lead by axis-weighted skill. It then seats three critics from competing labs so no two voices share weights or training data.
→ lead_id, critic_seats[3], axis_scores
02
Draft
The lead writes the first answer with mandatory inline citations — no source, no claim. On revision rounds it must address every severity-3 critique by direct quote, not paraphrase, so reviewers can see what changed and why.
→ draft.md, citations[], revision_diff
03
Critique
All four models read the draft in parallel and return findings tagged factual, missing-assumption, logic-gap, or citation-weak — each with severity 1–3. Any severity-3 finding forces another revision round, up to three.
→ findings[], severity, revise?
04
Audit
An independent second pass: each critic, blind to the others' verdicts, returns APPROVE, REVISE, or REJECT with a confidence score. The verdict is the council vote — never the lead's own opinion of its own work.
→ verdict, confidence, dissent[]

What you actually get back

Consensus

All four critics approve. Verdict ships with confidence ≥ 0.85 and the full audit log attached.

Split

Majority approves, minority dissents. Answer ships flagged, with the dissenting reasoning preserved in the trail.

Escalate

After three rounds without consensus, the question returns INCONCLUSIVE. The disagreement is the answer.

Most tools optimize for sounding sure. Pramana is built to refuse to answer when the council can't agree — because in research, an honest "we don't know yet" is worth more than a confident lie.

The seats · 03

Vendor diversity is not a feature. It's the design.

Every round seats models from competing labs. They don't share weights, training data, or incentives. They cannot collude their way to a wrong answer.

OpenAI

GPT-5.2

Lead · Critic

Anthropic

Claude Opus 4.5

Lead · Critic

Google

Gemini 3.1 Pro

Lead · Critic

xAI

Grok 4

Critic

Built for · 04

The questions worth arguing about.

Pramana is overkill for trivia. It's the right amount for the questions you'd otherwise pay an analyst to answer.

Literature review

“Synthesise the last 18 months of papers on RLHF reward hacking.”

Cited markdown · footnoted PDF

Document due diligence

“Read this 80-page S-1. What did underwriters bury in section 4?”

Risk memo · DOCX export

Contested claims

“Is the GLP-1 cardiovascular benefit independent of weight loss?”

Verdict card · 3 audit signatures

Receipts · 05

Every answer arrives with its paper trail.

No black box. You see the draft, every critique, the lead's revisions, and three independent audit verdicts before the answer is signed.

Verdict · sample

✓ Approved · 0.92

Question

Did the 2024 GLP-1 trials show cardiovascular benefit independent of weight loss?

Final answer · excerpt

The SELECT trial^[1] showed a 20% reduction in major adverse cardiovascular events with semaglutide, and a pre-specified mediation analysis^[2] attributed only ~30% of that benefit to weight change — implying a substantial weight-independent effect. The council flags one minority view: STEP-HFpEF^[3] outcomes were dominated by symptom scores, not hard endpoints…

GPT-5.2

Approveconf 0.94

Claude Opus 4.5

Approveconf 0.91

Gemini 3.1 Pro

Reviseconf 0.78

[1] NEJM 2023;389:2221 · [2] Circulation 2024;149:e1 · [3] NEJM 2023;389:1069

Access · 06

Free during the
research preview.

We're calibrating the council with the analysts and researchers who'll use it most. No card, no quota games — just a soft cap while we tune.

No setup

Your data, your call

We don't train on your inputs. Documents are processed in-session and discarded.

Export everything

Markdown, DOCX, and a forensic PDF report with every critique and verdict.

Open the councilFree during preview · soft cap on heavy days

Questions · 07

What you'll want to know.

Which models actually sit on the council?

Currently GPT-5.2, Claude Opus 4.5, Gemini 3.1 Pro, and Grok 4 — one per major lab. The router picks the strongest lead per question; the rest critique. The roster updates as new frontier models ship.

Does this actually reduce hallucinations?

Independently. Vendor-diverse critics catch errors that any single family of models would confidently repeat. Hallucinations that survive four rounds of disagreement are rare — and when they happen, the audit verdicts flag them with low confidence rather than hiding them.

Do you train on my questions or documents?

No. Inputs are processed in-session and not retained for training by Pramana or our model providers (we use enterprise endpoints with training opt-out). Documents are dropped after the session.

What can I export?

Markdown for fast paste, DOCX for editing, and a forensic PDF report containing the question, every draft and critique, all three audit verdicts with confidence scores, and citations.

Why is it slower than a single model?

Because four models are working. A typical verdict takes 30–90 seconds depending on rounds. If you need an answer in two seconds, you don't need a council — you need a chatbot.

Why Sanskrit?

Pramāṇa (प्रमाण) is the classical Indian epistemology term for a valid means of knowledge — perception, inference, testimony, comparison. A council of independent witnesses is older than the internet by a few millennia.

Proof,by council.

Four phases. One verdict. No hand-waving.

Route

Draft

Critique

Audit

Vendor diversity is not a feature. It's the design.

The questions worth arguing about.

Literature review

Document due diligence

Contested claims

Every answer arrives with its paper trail.

Free during theresearch preview.

No setup

Your data, your call

Export everything

What you'll want to know.

Which models actually sit on the council?

Does this actually reduce hallucinations?

Do you train on my questions or documents?

What can I export?

Why is it slower than a single model?

Why Sanskrit?

Proof,
by council.

Free during the
research preview.