PILLAR_CMP-LLMS LIVE CATEGORY LLMS · 4 ITEMS

Compare llms.

4 llms side-by-side on consistent axes. Per-axis values carry lastCheckedAt + confidence + source.

4 items tracked
▸ PICK ITEMS // LLMS
// AXISClaude Sonnet 4.5Default reasoning + tool-calling modelGPT-5Default reasoning + tool-calling on the OpenAI side
Pricing
$3 / $15 per M tokens (input / output)

Input cache discount lowers effective cost on repeat calls.

checked 2026-05-07 · medium confidence
$2.50 / $10 per M tokens (input / output)

Reasoning effort settings can multiply token use 2-3x.

checked 2026-05-07 · medium confidence
Context Window
200k
checked 2026-05-07 · high confidence
256k
checked 2026-05-07 · high confidence
Mcp Support
first-class

Claude was the canonical MCP client; tool-calling is native.

via Apps SDK

OpenAI's Apps SDK builds on MCP-style servers; native tool calling also works directly.

Leads At
  • Plain-English reasoning
  • Following multi-step instructions
  • Honest uncertainty
  • Massive consumer reach via ChatGPT
  • Apps SDK distribution
  • Image + audio capabilities
Lags At
  • Latency vs Haiku tier
  • Image generation (none — text-only)
  • Reasoning settings can be opaque
  • Fewer cross-vendor host integrations than Claude
Verdict
Default for non-trivial agentic work that lives outside ChatGPT.
Default if your audience lives in ChatGPT, or if you need image/audio out of the box.
OTHER CATEGORIES

Compare something else

DISCLOSURES

Editorial policy

  • Microsoft pages disclose Sush's employer.
  • No vendor pays for placement.
  • Verdicts are scoped to a use-case (``default for X'' / ``use when Y''). Not winner / loser.
  • Every axis carries lastCheckedAt and confidence. Pricing in particular drifts — we date it explicitly.
Suggest an item or correction