Your AI Tool Is Burning More Than Just Tokens

AI coding tools have gone from novelty to daily driver. And the productivity gains are real — I’m not here to argue otherwise. But there’s a side of this nobody talks about: every bloated prompt, every aimless 80-turn session, every time you paste a 400-line stack trace when 10 lines would’ve done it — that’s compute. And compute is energy.

Datacenters running these models are not free, and they’re not green. So beyond the billing, there’s a real argument for being deliberate about how you use these tools. The good news: the habits that cut your token bill are the same ones that make you faster.


Tokens are not free (and neither is the electricity)

Every prompt and response burns tokens. Most providers bill per million, which sounds cheap until you’re running 20 sessions a day across a team. Add up a month of sloppy workflows and it’s real money — and real kilowatt-hours.

The thing that helps most is understanding what actually drives cost: context. Every file the tool reads, every message in your history, every tool call result that comes back — all of it sits in the context window, and all of it gets re-sent with every new message.

Input tokens
├── System prompt (set by the tool, fixed cost per session)
├── Conversation history (grows every turn)
├── Files you include manually or the tool auto-pulls
└── Tool call results (search results, file reads, test output)

Output tokens
├── Model reasoning (if streamed)
└── Generated code / explanations

The conversation history is the silent killer. A session that starts cheap quietly becomes expensive 30 turns in — because the entire history ships with every single message.

Stop leaking tokens

Start fresh sessions often. When you switch tasks, open a new chat. A session about authentication has no business knowing about your CI pipeline. That old context is dead weight you’re paying to send.

Scope your prompts tightly. “Look at my codebase and fix the login bug” is a token disaster. “Here’s src/auth/login.ts — the token validation on line 42 returns 401 incorrectly” is the same task, a fraction of the cost.

Trim before you paste. Full stack traces are for humans who can’t search. The model needs the relevant part. 10 lines beats 200 every time.

Kill the explanations. “Just the code” is a complete sentence. Explanations burn tokens you probably won’t read anyway.


Context windows: bigger isn’t better

Every model has a limit on how much it can see at once. Modern tools advertise massive windows — 200k, 1M, even 2M tokens — but that number is not an invitation to fill it up.

graph TD
    A[You open a session] --> B[System prompt loaded]
    B --> C[You add a file or paste context]
    C --> D[You send a message]
    D --> E[Model generates response]
    E --> F{Still in window?}
    F -- yes --> C
    F -- no --> G[Oldest context dropped or session errors]

When context fills up, something has to give. Tools either silently drop the oldest messages — meaning the model forgets the instructions you gave at the start — or they error out entirely. Either way, long sessions degrade. The model you’re talking to at turn 40 isn’t the same one you started with at turn 1.

Treat your context window like working memory. Keep it relevant. The model does its best work when everything in the window actually matters to the current task.


Use the right tool for the job

Rather than picking one tool and forcing everything through it, the better frame is: what does this task actually need?

flowchart TD
    Start([What kind of task?]) --> A{Writing code<br>actively in the editor?}
    A -- yes --> B[Cursor or Copilot]
    A -- no --> C{Multi-step task that needs<br>shell access or spans files?}
    C -- yes --> D[Claude Code or Codex]
    C -- no --> E{Thinking through design<br>or explaining something?}
    E -- yes --> F[Claude.ai or ChatGPT]
    E -- no --> G[Any — just keep context tight]

My default stack: Claude Code for anything autonomous and multi-step, Cursor for active in-editor editing, Claude.ai for design discussions. They don’t conflict — they cover different phases of the same workflow.


The short version

Five habits that cut your footprint without slowing you down:

  1. One session per task. New task, new chat. Stop paying to carry yesterday’s context.
  2. Paste the minimum. If 10 lines demonstrate the problem, don’t paste 100.
  3. Ask for code, not commentary. You can read code. The explanation is often just noise.
  4. Review with git diff, not chat. Don’t ask the model to re-explain its own edits — just look at the diff.
  5. Break large refactors into steps. Each step in its own session keeps context small, lets you verify, and makes the whole thing easier to reason about.

Every token you don’t send is compute that didn’t run, energy that wasn’t spent. That’s a good thing — and it also makes you faster.

← All posts