The Economics of API Calls

Every line of AI code costs money. Not in infrastructure — in tokens. Here's how to think about it.

The Numbers

| Provider/Model | Input ($/M) | Output ($/M) | Context Window | |----------------|-------------|--------------|----------------| | MiniMax M2.5 | $0.30 | $1.20 | 1M tokens | | MiniMax M2.5-highspeed | $0.60 | $2.40 | 1M tokens | | GPT-4o mini | Cheap | Cheap | 128K | | Claude 3.5 | Premium | Premium | 200K | | Mistral (Anyscale) | $0.15 | $0.15 | 128K |

Output tokens are always more expensive than input. That's not going to change.

The Math

A typical agentic turn — system prompt, tool definitions, conversation history, tool output, response — easily burns 3,000-10,000 tokens.

Say 5,000 tokens per turn at MiniMax M2.5 rates:

$0.0024 per turn. Sounds tiny. But a 50-turn conversation? $0.12. A thousand conversations a day? $120/month.

Scale is the killer.

The Leverage Points

1. Model Routing

Not every task needs GPT-4o. Simple classification, extraction, formatting — small models handle these for 10-20x less.

Route by task complexity:

2. Context Window Management

Every token in context costs money. Strategies:

3. Caching

If you're calling the same prompts repeatedly, cache the results. Redis, SQLite, in-memory — doesn't matter. Hit the API less, save more.

4. Go Local

llama.cpp runs 7B-14B models on consumer hardware. For simple tasks, local inference is free after hardware cost.

Tradeoff: latency. Local is slower, but for background jobs it doesn't matter.

What This Means for Agents

The agent pattern — loop of thinking, acting, observing — is token-intensive by design. Every iteration adds context, and context = cost.

Three ways to survive:

  1. Keep loops short. Max 3-5 iterations. Give up if it hasn't worked.
  2. Fail fast. If the model can't solve it in 2 tries, escalate or return partial results.
  3. Route ruthlessly. Small tasks to small models. Save the big model for when reasoning actually matters.

The Bottom Line

The real cost of AI isn't the model — it's the context you build up around it. Every conversation, every tool definition, every retry. Be intentional about what stays in the prompt and what gets dropped.

The difference between a $500/month agent and a $5,000/month agent is often just 3x fewer tokens, not a better model.