A Checkpoint Is Not a Safety Net
We have git for code history. For everything else an agent leaves behind, we have a hash and a shrug.
Lately I've been building a lot of agents, and most of those are on Sprites, which give an agent a real, persistent Linux computer. Sprites have the ability to checkpoint and restore the whole filesystem on command. That part works. This post is about the thing no one hands you with the restore button: a way to know, thirty minutes and a dozen agent actions later, which checkpoint was the one worth going back to.
RecallMEM is a small local-first memory app I maintain. It runs inside a Sprite, and before I let an agent loose on it, I make a checkpoint. Good instinct. Then the agent works: clones, installs, starts a dev server, edits a config, runs something, edits it again. Something breaks. I go to roll back.
I list the checkpoints. I get this:
checkpoint_9ac31f
checkpoint_b71e02
checkpoint_0f12dd
Restore was right there. The primitive worked. I just had no damn idea which of those three to land on. Was 0f12dd before the bad config or after it? Was the app even healthy when b71e02 was taken? If I pick wrong, what am I about to overwrite?
The naive idea of safety is "the platform can restore, so I'm covered." I believed it until the first time it mattered. A restore button is not a safety net. The net is knowing which state was good, and a bare hash tells you nothing about that. A snapshot that captures everything and explains nothing isn't a way back. It's a shrug with a UUID.
A snapshot is sacred. A hash has no story.
Here's the split it took me a while to say cleanly. The snapshot itself is the hard, valuable part: it captures the entire filesystem, every file, every installed package, the state the running services left on disk. On Sprites, sprite checkpoint create does 100% of that. I add nothing to the capture.
And the capture is fast, which is the part that surprised me. A good filesystem snapshot doesn't copy the whole disk. It records what changed since the last state, the same copy-on-write idea behind ZFS and btrfs snapshots, so a checkpoint lands in around ~XXX ms and restore is about as quick. Fast enough that you can take them constantly and never feel it. The mechanism is solved. Which is exactly why it stings that the unsolved part is just knowing what you captured.
What's missing has nothing to do with the snapshot. It's the label. You're handed a complete, honest capture of a moment, stamped with an ID, and then it's on you to remember what that moment was. A dozen agent actions later, you don't. The unit you're holding, a hash, is not the unit you think in: what was happening, did it work, what changes am I sitting on.
So the problem was never "add restore." Restore exists. The problem is that the moment got captured and the meaning didn't.
We solved this for code. We didn't solve it for the computer.
We already fixed a version of this, decades ago, for one specific kind of state: source code. Git is environment-independent history for the files you track. A commit has a message, a diff, a parent. You can stand in front of a hundred commits and know roughly where you are, because every one of them carries its own story. No one freezes in front of git log the way I froze in front of that checkpoint list.
But an agent working on a real computer changes far more than the files git tracks. It installs packages. It writes generated files git never sees. It leaves a database in some state. It starts services that are running, or were. Commits don't capture any of that. Git is code history, and it's excellent at it. What an agent leaves behind is environment history: packages, services, runtime state, the whole machine as it actually is at a moment in time.
We have a beautiful, mature tool for code history and almost nothing equivalent for environment history. For most of the last twenty years that was fine, because the environment was something you set up once, by hand, and then mostly stopped touching. You didn't need a timeline of your machine because your machine barely moved. Hand that same machine to an agent that installs, writes, and breaks things on its own initiative, and the environment stops being a fixed backdrop. It becomes the thing that changes most, and the thing you most need to be able to walk backward through. Right now it's also the thing we're worst at recording.
A checkpoint is the raw material for environment history. It's just missing the part that made git usable: a story attached to each point.
Three cheap signals
So I tried to attach the story. I'll be honest about how unclever the fix is, because if your first reaction is that this is trivial, three strings and a shell-out to git, you're right. That's the point. None of this is hard. The fact that I had to bolt it on at all, that it isn't simply part of how a machine's state gets recorded, is the actual story.
I wrapped the checkpoint call in a small CLI. It does not inspect the snapshot. It never diffs the filesystem image. It can't look at a capture and tell you "this installed Postgres." What it does is grab three things that happen to be lying around the second you run it, and bolt them onto the checkpoint:
- Intent: the comment you pass. If you don't pass one, it falls back to your last git commit subject. That's the whole heuristic.
- Changed files:
git diff --name-status HEAD, your working tree against HEAD. Your uncommitted changes, roughly. - Verification: only if you hand it a command.
--verify "npm test"runs it and records the exit code as pass or fail. It does not guess what to check.
One line, in practice:
npx workbench checkpoint "before auth refactor" --verify "npm test"
Verification doesn't have to be a test suite. A smoke test, a curl against a health endpoint, whatever that project already uses to know it's alive. The point is recording a yes or no next to the moment, not proving correctness.
Each piece gets stamped with the checkpoint id, so the hash finally has a card attached. The same list stops reading like lottery numbers:
10:02 clean clone, verify passing
10:11 before auth refactor
10:18 after package install, verify failing
10:27 working version, app healthy
Now restore isn't a gamble. I pick 10:27 because I can see it was the good one. The decision took seconds, because the question stopped being "which hash" and became "which moment." One rule I'd defend: the snapshot is sacred, the context is best-effort. If the git read fails, or the write fails, the tool never touches the snapshot. It warns and exits clean. The capture is the thing you can't afford to lose, so nothing in the convenience layer is allowed to risk it.
Where the label lies
Here's the part I'd want you to poke at, because I had to.
The context comes from git. The snapshot comes from the whole filesystem. They don't always agree, and when they disagree, the label lies a little. Two ways, both real.
A file changes on the machine that git doesn't track: build output, a written .env.local, runtime data. It's in the snapshot. It's absent from the changed-files list. The capture has it; the story never mentions it.
Or the reverse, which got me first. I'd already committed my work, so git diff HEAD came back empty, so the checkpoint got no file context at all, while the snapshot was full of changes. The most complete capture, the emptiest label, for the dumbest reason.
So the context is not derived truth about the snapshot. It's a best-effort note, written by whoever ran the command, using whatever git happened to know at that second. It's still a large upgrade over a bare hash. But it's a label, not an X-ray, and the gap between the two is exactly the gap I started with: git can't see the environment, which is the whole reason I wanted environment history in the first place.
The version that reads the snapshot
The honest fix is to stop leaning on git and read the snapshots themselves. Take checkpoint N and N-1, diff the filesystem images, and now you can say what actually changed on the machine, including everything git is blind to. That's a real project, not an afternoon of shelling out. But it's the version that closes the gap, because it derives the story from the same thing the capture came from.
That's the whole point, bigger than my little command: environment history deserves the tooling we already gave code history. A timeline you can read. A diff between two points. A reason to trust the thing you're about to restore. We built all of that for source files and almost none of it for the computer the source runs on, and agents are about to make that absence very loud.
If you want to feel the exact moment I'm talking about, it takes ten minutes. Spin up a Sprite, make a checkpoint, break something on purpose, and restore it. Restoring is the easy part. The pause comes a second later, when you look at the checkpoint list and can't tell which hash was the good one. The wrapper and the code behind those three signals are on GitHub if you want to take the pattern.
The capture was never the hard part. That problem is solved, and solved well. The hard part is making a captured moment legible enough that a tired human, at 10:27, can land on the right one without guessing. Code got that treatment a long time ago. The computer underneath it is still waiting.