Agents Need Computers. Humans Need Monitors.

Persistent agent work needs restore points a human can trust.

Sprite Agent Workbench showing fleet state and cost exposure

I once told an agent to move a directory. It deleted the original first, then tried to copy it. Too late. The code was gone.

That was not really a model hallucination. The agent had tools. It had a filesystem. It made a change, and the change was wrong. What I needed was not a better answer.

I needed a way back.

Most agent demos skip this part because nothing survives them. The agent gets a prompt, calls a tool, returns a result, and the environment disappears before anyone has to ask what changed. No filesystem history. No long-running process. No stale service. No secret sitting in the wrong place. No restore decision. That is why the demo works. It is also why the demo lies.

Most agent tooling today optimizes for capability: more tools, more files, more commands, more systems to reach. Capability matters. Agents need reach to do useful work. But the more an agent can touch, the more a human needs to understand what happened after the tool call, and the industry is underinvesting in that side. More capability without better recovery just gives the agent more places to make a mess.

Agent Demos Throw Away The Consequences

The usual agent demo has a clean shape:

prompt -> tool call -> result

That loop is useful to learn. It also leaves out the part that matters once agents work on real software. Real software has files, services, background jobs, database migrations, environment variables, logs, caches, secrets, running processes, and weird local setup decisions nobody remembers making. A disposable sandbox can pretend none of that exists. A persistent computer cannot.

Once the environment survives the request, you inherit every question the demo avoided. What changed? What was running? What did the agent install? What did it delete? If I restore, what am I about to overwrite? The hard part of agent execution starts when the environment survives the request.

A Checkpoint Without Context Is Not Enough

Imagine this is your checkpoint list:

checkpoint_9ac31f
checkpoint_b71e02
checkpoint_0f12dd

An agent has been working for 30 minutes. It installed packages, edited files, started a service, changed a config. Something is broken and you want to go back.

Which checkpoint would you restore?

That pause is the problem. The issue is not that there is no restore button. The issue is that you do not trust it. The checkpoint exists, but it has no story around it. You do not know whether the app was healthy when it was created, whether verification passed, whether the bad change had already happened, or what you are about to overwrite. Checkpointing is the primitive. Restore confidence is the developer experience.

At minimum, checkpoint context should answer five questions:

task
files_changed
verification_status
app_health
restore_will_overwrite

What was the agent trying to do? What files changed? Did verification pass? Was the app healthy? What do I lose if I go back? Verification does not have to mean a full test suite. It can mean a smoke test, a health endpoint, or whatever check that project uses to know the app was alive.

The same list with context looks like this:

10:02 AM - Clean repo cloned. Verification passing.
10:11 AM - Before auth refactor.
10:18 AM - After package install. Verification failing.
10:27 AM - Working version. Preview healthy.

Now restore is a different action. I am not gambling on a hash. I am choosing a known state. Context turns checkpointing from a machine feature into a human decision.

Agents Need Bounded Computers

When I say agents need computers, I do not mean random cloud VMs with no boundaries. I mean a bounded persistent environment with a filesystem, packages, running services, a URL, lifecycle, state, checkpoints, and recovery. A place where work can continue beyond one request.

A sandbox is mostly about isolation. Run something, contain it, throw it away. That is useful, but a lot of agent work is not shaped like one throwaway execution. An agent may need to clone a repo, install dependencies, run tests, start a dev server, inspect logs, make changes, try again, sleep, and continue tomorrow from where it left off. That shape is closer to a computer. A controlled computer with permissions, limits, and a blast radius, but still a computer.

This post is not about Sprites specifically. It is about the shape. A Sprite gives an agent a persistent Linux environment that can write files, install packages, expose URLs, sleep, wake, checkpoint, and restore. Any environment with that shape inherits the problem this post is about.

A stateless sandbox can disappear and take its mistakes with it. A persistent computer keeps the work. That is the point. It also keeps the consequences. So the product question becomes: how do we make persistent agent work inspectable, and restore something a human can actually trust?

Humans Need Monitors

This is why I built Sprite Agent Workbench.

Not because the CLI is bad. If I know exactly what command I need, the terminal is fast. But when an agent has been working inside a persistent computer, I do not only need control. I need to understand what happened. A monitor for persistent agent work should answer boring questions clearly. Which environment am I looking at? Is it running, warm, or cold? What URL maps to it? What checkpoint am I looking at? What changed since the last one? Did verification pass? What will I lose if I restore? Those are not fancy questions. They are trust questions.

Here is the boring version of the demo, which is the point.

RecallMEM, my local-first AI memory app, runs inside a Sprite. Before letting an agent work on it, I created a checkpoint from the Workbench with context attached: the task I was about to hand the agent, verification passing, the app responding at its URL.

Sprite Agent Workbench showing fleet state and warm/cold Sprite status

Then the agent worked. The Workbench showed the change and the failing check sitting next to the checkpoint that predates both.

I restored. RecallMEM came back, verification went green, and the bad change was gone. The decision took seconds, because the question was never which hash to gamble on. It was choosing a known state.

The CLI is for control. The monitor is for trust.

The Agent Did Not Hesitate

The directory that agent deleted at the start of this post was from RecallMEM. I had an agent doing real work inside a real environment. I gave it a normal instruction. One wrong operation later, the work was gone. No checkpoint before the change, no record of what the agent did leading up to it, just me staring at a terminal reconstructing from memory what existed five minutes ago.

The failure was not exotic. The agent did not go rogue or hallucinate an API. It made a normal-looking change in the wrong order, the kind of mistake a tired human makes too. The difference is that when a human is about to do something destructive, they usually hesitate. They check. An agent does not hesitate unless the system around it creates the hesitation.

A checkpoint with context is manufactured hesitation. It is the pause the agent will never take, moved into the system where it can actually exist.

MCP Solves Reach. Not Consequences.

MCP gives agents a standard way to reach tools and context. That is genuinely useful. It makes agents more capable and integrations easier to reason about. But reach is not safety. If MCP makes it easier for agents to touch files, services, databases, repos, internal APIs, and private data, then the execution environment matters more, not less. A tool call is not the end of the story. It is often the beginning of the mess.

That does not mean MCP is bad. It means MCP makes this problem worth solving.

Why Not Just Git?

A fair question: why not just use git?

You should use git. Git is great for source code. But an agent environment includes installed packages, generated files, local database state, running services, caches, environment files, and runtime behavior that commits do not capture. Git tells you what changed in tracked files. It does not tell you whether the service was healthy, which package got installed outside the repo, or what restore will do to the whole working environment.

The answer is not git or checkpoints. It is both, at different layers. Git is code history. Checkpointing is environment history. For agents, environment history is the layer we have been missing.

The Better Model

The old model was:

agent = model + tools

That was never going to be enough for real work. The better model is:

agent system = model + tools + computer + state + checkpoints + history + recovery

Agents need computers because real work leaves state behind. Once you give agents computers, you inherit a new responsibility: the agent should not leave state humans cannot inspect, or consequences humans cannot recover from.

At the start, I said I did not need a better answer. I needed a way back. The next phase of agent infrastructure is not just smarter agents. It is giving humans a trustworthy way back.

So yes, agents need computers.

But humans need monitors.


Chris Dabatos - DevRel Engineer

Chris Dabatos

DevRel Engineer, Builder, and Technical Storyteller based in Las Vegas. He builds things with AI and writes about what breaks.

Sections
Now playing
Intro
0:00 / 0:00