Your LLMs areburning money.You just can't see it.

Real-time visibility into every LLM call, every why is answered without touching a single line of your existing code.

2min
to full observability
0changes
required in most cases
<50ms
added latency
scroll

Integration

Pick your path. Working in 2 minutes.

Four ways in — from two lines of code to zero file changes. Every call captured automatically from the first request.

1
Install the SDK
pip install whyllm
2
Add two lines
whyllm.init(api_key=...)
3
Ship it
Dashboard is live instantly
Python SDK
bash

Add two lines to your existing code. Everything else stays the same.

Compatible with every major LLM provider

OpenAIAnthropicAzure OpenAIAWS BedrockGoogle Vertex AIMistralCohereGroqTogether AIPerplexityOllamaFireworks AI

The problem

Right now, you're flying blind

Every day without observability is money you can't recover and quality issues you can't explain.

Without whyllm

Invoice arrives. You had no idea the bill would be this high.

With whyllm
  • Proxy-layer budget cap — HTTP 429 before the call fires
  • GPT-5.4 surcharge alert: doubles $2.50→$5.00/M past 272K context
  • Auto-routes to gpt-5.4-mini when a project threshold trips
Without whyllm

A user screenshots a hallucinated response. You find out on Twitter.

With whyllm
  • <1ms heuristic scorer: hedge ratio, factual anchoring, refusal patterns
  • Flags outputs below confidence threshold pre-response, not post
  • LLM judge fires only on flagged spans — cost stays near zero
Without whyllm

You tried three observability tools. Each took days and half your prompts weren't captured.

With whyllm
  • `whyllm run app.py` — monkey-patches openai/anthropic at import time
  • Zero app code changes, zero proxy in the critical path
  • 100% capture rate from request #1
Without whyllm

Your app feels slow. You blamed the database for a week. It was a 4-second LLM call.

With whyllm
  • TTFT, generation time, total latency tracked per model × route × user_id
  • P95 spike on any endpoint? Drill to exact calls in 2 clicks
  • Prompt length, model version, and timestamp all indexed
Without whyllm

Which feature is burning $3k/month? You have spreadsheets, guesses, and an angry CFO.

With whyllm
  • Tag calls with feature, user_id, session via SDK context headers
  • Filter spend by any dimension in the dashboard
  • /summarize = $0.0034/call × 8,200/day — know it before CFO asks
Without whyllm

You shipped a new prompt. Engagement dropped. You can't tell if the prompt caused it.

With whyllm
  • SHA-256 content hash stored per system prompt version
  • Quality score delta auto-computed across versions
  • Regression surfaced same deploy — not same week
Without whyllm

Some calls send 50k tokens of context. Most only need 500. You're paying 100× too much.

With whyllm
  • Histogram: prompt_tokens vs completion_tokens per endpoint
  • P99 context size + estimated monthly waste in dollars
  • Alerts when GPT-5.4 crosses 272K surcharge boundary (rate doubles)
Without whyllm

Legal asks for every prompt that touched customer PII last quarter. Your answer: silence.

With whyllm
  • Append-only immutable span log — full prompt and response bodies
  • Filter by date, model, user_id, or regex pattern
  • CSV export or REST API — satisfies SOC2 and GDPR
Without whyllm

You're hitting rate limits in prod. You find out when users see 500 errors at 2am.

With whyllm
  • RPM tracked vs your tier limit in real time
  • PagerDuty/Slack alert fires at 80% utilisation
  • Auto-fallback to secondary API key or queued retry — zero user impact
Without whyllm

You're running GPT-5.4 and Claude side by side but have no data on which performs better.

With whyllm
  • Traffic-split at proxy: GPT-5.4 ($2.50/M) vs Claude Sonnet 4.6 ($3/M) vs Gemini 2.5 Pro ($1.25/M)
  • Compare cost_per_call, p95_latency, hallucination_rate with statistical significance
  • Switch winner with one config line

What you get

Three things no other tool
does well together

01

Full-stack tracing

Every LLM call captured — prompt, response, model, token count, latency. Filter by user, feature, or environment. Search your entire history in milliseconds.

OpenAI GPT-5.4 · Anthropic Claude Sonnet 4.6 · Gemini 2.5 Pro · AWS Bedrock · Azure OpenAI · any OpenAI-compatible endpoint
02

Cost control

Not just dashboards — actual enforcement. Set budgets per project, user, or API key. Auto-route to a cheaper model when a threshold hits. Kill switches included.

Real-time spend alerts · Auto-routing · Hard limits · Per-user budgets
03

Hallucination detection

Fast heuristics score every response for confidence, factual consistency, and refusal patterns. LLM-as-judge only fires on flagged spans — keeps cost near zero.

Sub-1ms heuristic pass · Sampled LLM judge · Confidence scores · Trend view
vs the competition
Tool
2-min setup
Cost control
Hallucination detection
Open source
Helicone
LangSmith
Langfuse
Arize
whyllmyou

The dashboard

Everything in one place

app.whyllm.io/dashboard
my-app
Overview
Traces
Cost
Quality
Alerts
Playground
Monthly budget
$127 / $300
↓ 12% vs last month

Overview

Last 30 days • Updated just now

Live
Total spend
$127.40
↓ 12%
API calls
48,291
↑ 8%
Avg latency
892ms
↓ 34ms
Hallucination rate
2.3%
↓ 0.8%
Daily API spend30d
whyllm enabled
ModelTokensCostLatencyScore
gpt-5.41,847$0.0051.2s✓ 98%
claude-sonnet-4-62,103$0.0090.9s⚠ 72%
gpt-5.4-mini934$0.0010.4s✓ 95%

Pricing

Simple. Usage-based.
No per-seat nonsense.

Pay for what you trace. A 10-person team shouldn't cost 10×.

Hobby
$0forever

For solo devs and side projects

  • 50k spans / month
  • 7-day retention
  • Cost dashboard
  • 1 project
  • Community support
Start free
Most popular
Pro
$0.10per 10k spans

For teams shipping LLMs in production

  • Unlimited spans
  • 90-day retention
  • Budget enforcement
  • Hallucination scoring
  • Alerts & webhooks
  • Unlimited projects
  • Email support
Start Pro →
Enterprise
Custom

For orgs with scale and compliance needs

  • Everything in Pro
  • SSO / SAML
  • Custom retention
  • Self-hosted option
  • SLA guarantee
  • Dedicated support
Talk to us
Free forever — no credit card required

Start monitoring in
2 minutes.

The engineers who wait find out about problems from their users.
The ones who ship win.

Get started free →
pip install whyllm