Proof,
by council.
One model is an opinion. Four frontier models, forced to disagree on the record, is evidence. Pramana is the deliberation engine for serious research.
A single model is a confident stranger.
Four models, forced to disagree on the record, is the closest thing software has to peer review.
Four phases. One verdict. No hand-waving.
Every question runs the same loop: route, draft, critique, audit — with up to three revision rounds. The entire trail is preserved, so a reviewer can replay how the council got from your prompt to its answer.
- 01
Route
A fast classifier scores the prompt across six axes — factual, mathematical, code, long-context, multimodal, opinion — and picks the lead by axis-weighted skill. It then seats three critics from competing labs so no two voices share weights or training data.
→ lead_id, critic_seats[3], axis_scores - 02
Draft
The lead writes the first answer with mandatory inline citations — no source, no claim. On revision rounds it must address every severity-3 critique by direct quote, not paraphrase, so reviewers can see what changed and why.
→ draft.md, citations[], revision_diff - 03
Critique
All four models read the draft in parallel and return findings tagged factual, missing-assumption, logic-gap, or citation-weak — each with severity 1–3. Any severity-3 finding forces another revision round, up to three.
→ findings[], severity, revise? - 04
Audit
An independent second pass: each critic, blind to the others' verdicts, returns APPROVE, REVISE, or REJECT with a confidence score. The verdict is the council vote — never the lead's own opinion of its own work.
→ verdict, confidence, dissent[]
All four critics approve. Verdict ships with confidence ≥ 0.85 and the full audit log attached.
Majority approves, minority dissents. Answer ships flagged, with the dissenting reasoning preserved in the trail.
After three rounds without consensus, the question returns INCONCLUSIVE. The disagreement is the answer.
Most tools optimize for sounding sure. Pramana is built to refuse to answer when the council can't agree — because in research, an honest "we don't know yet" is worth more than a confident lie.
Vendor diversity is not a feature. It's the design.
Every round seats models from competing labs. They don't share weights, training data, or incentives. They cannot collude their way to a wrong answer.
The questions worth arguing about.
Pramana is overkill for trivia. It's the right amount for the questions you'd otherwise pay an analyst to answer.
Literature review
“Synthesise the last 18 months of papers on RLHF reward hacking.”
Document due diligence
“Read this 80-page S-1. What did underwriters bury in section 4?”
Contested claims
“Is the GLP-1 cardiovascular benefit independent of weight loss?”
Every answer arrives with its paper trail.
No black box. You see the draft, every critique, the lead's revisions, and three independent audit verdicts before the answer is signed.
The SELECT trial[1] showed a 20% reduction in major adverse cardiovascular events with semaglutide, and a pre-specified mediation analysis[2] attributed only ~30% of that benefit to weight change — implying a substantial weight-independent effect. The council flags one minority view: STEP-HFpEF[3] outcomes were dominated by symptom scores, not hard endpoints…
Free during the
research preview.
No setup
Sign in, ask, get a verdict. No keys, no model picker, no prompt engineering.
Your data, your call
We don't train on your inputs. Documents are processed in-session and discarded.
Export everything
Markdown, DOCX, and a forensic PDF report with every critique and verdict.
What you'll want to know.
Which models actually sit on the council?
Currently GPT-5.2, Claude Opus 4.5, Gemini 3.1 Pro, and Grok 4 — one per major lab. The router picks the strongest lead per question; the rest critique. The roster updates as new frontier models ship.
Does this actually reduce hallucinations?
Independently. Vendor-diverse critics catch errors that any single family of models would confidently repeat. Hallucinations that survive four rounds of disagreement are rare — and when they happen, the audit verdicts flag them with low confidence rather than hiding them.
Do you train on my questions or documents?
No. Inputs are processed in-session and not retained for training by Pramana or our model providers (we use enterprise endpoints with training opt-out). Documents are dropped after the session.
What can I export?
Markdown for fast paste, DOCX for editing, and a forensic PDF report containing the question, every draft and critique, all three audit verdicts with confidence scores, and citations.
Why is it slower than a single model?
Because four models are working. A typical verdict takes 30–90 seconds depending on rounds. If you need an answer in two seconds, you don't need a council — you need a chatbot.
Why Sanskrit?
Pramāṇa (प्रमाण) is the classical Indian epistemology term for a valid means of knowledge — perception, inference, testimony, comparison. A council of independent witnesses is older than the internet by a few millennia.