April 5, 2026
Token Meter — Finally See What Your LLM APIs Actually Cost
You use multiple LLM providers. Each bills differently. Token Meter gives you one dashboard for all of them.
The problem
If you build anything with LLMs, you probably use three to five providers. Anthropic for complex reasoning. OpenAI for general-purpose tasks. Google for long-context workloads. Groq or DeepSeek when you need it cheap and fast.
Each of these providers bills differently. Some charge per-token with separate input and output rates. Some have different prices for cached tokens vs. fresh tokens. Some bill per-request on certain tiers. Some charge for batch differently than real-time.
The result: you have no unified view of what you are actually spending. You check three to five billing dashboards. You do mental math to compare cost-per-token across providers. You find out you overspent when the invoice lands, not when the spend happens.
This is the problem Token Meter solves.
What Token Meter does
Token Meter tracks every LLM API call you make and computes the exact cost in real time. It works across all your providers, normalizes the billing differences, and shows you a single dashboard with daily, weekly, and monthly breakdowns.
You can see which models cost the most, which projects burn through budget fastest, and where there are opportunities to switch to a cheaper model without sacrificing quality. Pair it with ctxlint to trim unnecessary context before it reaches the model — fewer tokens in means lower costs out. Everything updates as requests flow through -- no waiting for end-of-month invoices.
We track 10 providers today: Anthropic, OpenAI, Google, Groq, DeepSeek, Mistral, Cohere, Ollama, Azure OpenAI, and AWS Bedrock. Our documentation covers 40+ models with accurate per-token pricing, including cache read/write rates, batch pricing, and multimodal inputs.
Smart routing
Beyond tracking, Token Meter can route your requests. The Gateway sits between your application and LLM providers. Point your SDK at gateway.tokenmeter.sh instead of the provider directly, and Token Meter handles the rest.
Three routing strategies:
- Failover. Define a chain of providers. If the first returns an error or hits a rate limit, Token Meter tries the next one automatically. No code changes in your app.
- Load balancing. Distribute requests across multiple providers to stay under rate limits and reduce latency.
- Cost-based routing. Always send requests to the cheapest available provider for a given model class. If Groq is serving Llama at half the price, your requests go there first.
All strategies include automatic retry on transient failures. You configure the behavior once; Token Meter handles the execution.
Budget alerts
You can set soft warnings and hard caps. A soft warning emails you when spend crosses a threshold -- say $50/day or $500/month. A hard cap stops requests entirely when a budget is exhausted, preventing runaway costs.
Token Meter also includes automatic anomaly detection. If your daily spend hits 3x or 5x the rolling average, you get flagged immediately. You do not need to configure thresholds for this -- it works out of the box on the Pro plan and above.
Three ways to use it
MCP server. Token Meter runs as a remote MCP server. Add our URL to your client config and your AI assistant can check spend, compare model prices, and manage budgets through natural language. Works with Claude Code, Cursor, and any MCP-compatible client. Token Meter is also available on mcp.hosting for one-click setup with auto-provisioned API keys and session proxy support for reliable stateful connections. You can verify your server configuration meets protocol standards with mcp-compliance.
If you use multiple AI providers from your terminal, Yaw Terminal makes it easy to switch between them — and Token Meter shows you what each session costs. For teams running MCP servers on private infrastructure, tailscale-mcp provides secure access over your tailnet without exposing endpoints to the public internet.
API Gateway. Swap your provider base URL to gateway.tokenmeter.sh and get cost tracking, smart routing, failover, and rate limit management without changing your application code. Requests are translated to OpenAI-compatible format regardless of the downstream provider.
Dashboard. The web UI at tokenmeter.sh gives you analytics, cost trends, model comparisons, and budget management. No terminal required.
10 providers, one dashboard
Each provider has full model coverage with accurate pricing pulled from official rate cards. We update pricing within 24 hours of provider announcements.
Pricing
| Tier | Price | Retention | Key features |
|---|---|---|---|
| Free | $0 | 7 days | Spend summary, session cost, model pricing |
| Pro | $19/mo | 90 days | Analytics, budget alerts, cost trends, anomaly detection |
| Gateway | $49/mo | 90 days | Smart routing, failover, rate limits, latency reporting |
| Team | $99/mo/seat | 365 days | Multi-user dashboards, per-member tracking, org budgets, SSO |
The free tier is fully functional for basic spend tracking. No credit card required, no trial expiration.
Get started
Sign up at tokenmeter.sh and start tracking in under a minute. Add the MCP server to your client config or point your SDK at the gateway. Your first week of data will tell you more about your LLM costs than the last six months of invoices combined.
Part of the Yaw Labs ecosystem
Token Meter is one piece of a broader toolkit for AI developers. Yaw Terminal gives you multi-provider AI access from one terminal. mcp.hosting lets you deploy and manage MCP servers with one click. ctxlint catches bloated context before it burns tokens. tailscale-mcp secures MCP servers over your private tailnet. And mcp-compliance validates that your servers meet protocol standards.
For weekly coverage of AI tooling and developer workflows, subscribe to Token Limit News.