Back to notes

Token Burn / Coding Harnesses

Where Coding Agents Burn Tokens

The invoice is downstream of architecture. Here are the hidden places coding harnesses waste money.

If you want to understand why AI coding tools get expensive, do not start with model pricing.

Start with waste.

A coding harness can take a cheap model and make it expensive by feeding it a messy world. It can take a strong model and make it behave timidly by surrounding it with stale evidence. It can make a simple frontend edit feel like a courtroom proceeding because every step requires the model to re-establish reality.

The invoice is downstream of the architecture.

Here are the places tokens burn.

1. Re-Sending The Whole Conversation

The easiest harness to build is an append-only chat.

User says something. Assistant replies. Tool calls happen. Next time, send the whole thing back.

This feels honest because nothing is hidden. It is also wildly inefficient. A coding agent does not need every word of every prior turn. It needs the parts that define the current state of the task.

The model is stateless. A giant transcript asks it to become an archaeologist. It must decide which old facts still matter, which failures are solved, which snippets are stale, and whether the latest edit actually landed.

That is unpaid project management work performed by an expensive language model.

Lucena Coder treats conversation projection as a product surface. Prior work is reconstructed into a current, usable view. The model gets continuity without drowning in old ceremony.

2. Carrying Old Failures Forward

A common failure mode:

That is not "caution." That is corrupted state.

Failures are useful only while they are live. Once resolved, they should not remain in the model's current world as an active problem. They can survive as compact history if needed, but the current solved state must dominate.

Lucena Coder reconciles tool history into current reality. If a failure has been solved, the model should wake up to the solved state.

This is a small-sounding detail with large consequences. It prevents the miserable loop where the model keeps reopening files because the context still smells like smoke.

3. Duplicate File Bodies

If a model reads Collection.jsx, edits it, and then reads it again, many systems now have multiple versions of Collection.jsx inside the prompt.

Sometimes the old copy is above the new copy. Sometimes the new copy is buried below a tool log. Sometimes both are labeled in a way that makes sense to the harness but not to a stateless model trying to act.

The result is predictable: hesitation, re-checking, accidental duplicate imports, or edits based on the wrong version.

Lucena Coder treats current code as current code. If a snippet appears again, the latest live copy wins. Older duplicate bodies do not keep consuming context as if they are equally authoritative.

This is not lossy. The action history still exists. The current code exists once.

That is the part the agent can act on.

4. Thin Tool Results

Tool results are not receipts if they merely say "success."

A write tool that returns only ok: true forces the next model call to ask the obvious question: "What is in the file now?"

So it reads the file.

Then maybe it reads the CSS.

Then maybe it runs a build.

Then maybe it summarizes.

Each step feels reasonable in isolation. Together, it is a tax created by a bad receipt.

Lucena Coder makes mutation results carry proof. An edit result should include the changed file's current state and, when the environment supports it, verification output attached to the same proof chain.

The model should not need to audit its own last tool call unless something genuinely failed.

5. Always-On Tool Menus

Every tool you show the model is another path it can consider.

That does not mean tools are bad. It means tool surfaces should be staged.

At the start of a task, the agent usually needs discovery. After discovery, if file changes are expected, it needs a plan. After the plan, it needs mutation tools constrained to the known edit targets. After implementation, it may need terminal or verification. If the user asks a new question, the surface should reset around that new impetus.

Most harnesses flatten this into "here are all the tools, good luck."

Lucena Coder exposes tools as mise en place. The model sees what belongs to the current step.

That reduces mistakes and saves tokens because the agent stops spending output budget talking itself away from the wrong tools.

6. Treating Terminal As A Text Blob

Terminal commands are lifecycle events, not just strings.

A command can be running. It can emit partial output. It can finish. It can fail. It can be a background server. A deploy can take longer than a short request timeout and still succeed.

If the harness collapses all of that into a flat text result, the model can wake up to nonsense: a "timeout" that later succeeded, a deploy retried twice, a preview link mistaken for a product preview, or a command marked complete before the underlying process actually finished.

Lucena Coder treats terminal receipts as final-state proof. The agent should not be asked to interpret transport half-states as shell reality.

If the command completed, the receipt says what completed. If the command is still running, the UI should show that without inventing a model-visible result.

7. Verification As A Scavenger Hunt

Verification is important.

Endless verification is expensive anxiety.

A good harness should know when a build is available, when a direct file check is the right fallback, and when a side-check should review only the touched files instead of reopening the entire workspace.

Lucena Coder attaches verification to mutation receipts where possible. For deeper follow-up, it can run a focused secondary check over the actual touched state. The point is not to create another agent wandering the repo. The point is to catch concrete unfinished work without turning completion into a loop.

Verification should answer: "Does the current touched state satisfy the user's desired outcome?"

Not: "Can we find more files to read?"

The Pattern Behind The Waste

All of these token leaks share the same shape:

The harness makes the model infer current reality from messy historical artifacts.

So the model spends tokens doing state reconstruction.

Lucena Coder tries to move that work out of the model and into the architecture. The system already knows which snippets it returned. It knows which files were edited. It knows which command just ran. It knows whether a mutation receipt contains current proof. It knows whether a task brief is complete.

If the architecture knows, the model should not have to rediscover it.

That is the core trick.

Not a prompt trick. Not a cheaper model. Not a magic compression scheme.

Just fewer impossible states.

The Goal

The goal is not to make every run tiny.

Some tasks are large. Some tasks need broad research. Some tasks genuinely need a lot of context.

The goal is to stop paying for confusion.

Tokens should buy reasoning over the problem, not repeated orientation. They should buy design judgment, code synthesis, and careful final checks. They should not buy the model rereading a file because the edit result failed to prove the edit.

That is why Lucena Coder is obsessed with current state.

The cheapest token is the one you did not need because the architecture already knew the truth.