Cloud Pricing Is A UX Problem
The painful part is not just the number. It is finding out what the number meant after the app is already real.
Cloud pricing is usually discussed like it's math.
Two recent discussions on X made it feel more like a product problem.
Zach Wilson posted that DataExpert is leaving Heroku after nine years. He said the app had been deployed more than 2,500 times there, and that the business was moving from roughly $1,200/month to about $175/month after migrating to Fly.io.
The interesting part was not just cheaper hosting. It was that the old platform limits had started to become product limits. Multiple wildcard SSL certificates, separate apps, awkward pricing jumps, database choices. Eventually those are not just billing details. They shape what the product can afford to be.
Then Jacob Paris pushed back on a different comparison between Vercel's sandbox pricing and a small always-available machine. His point was fair: if a workload only uses active CPU a tiny percent of the time, pricing it like a fully busy VM all month is misleading.
Those are both real arguments.
And I got pulled into this because RecallMEM is exactly the kind of app that makes cloud pricing annoying. It has a web app, Postgres, pgvector, background memory extraction, voice, file uploads, embeddings, and long stretches where nothing is happening until suddenly a lot is happening.
That shape does not fit neatly into one pricing story. CPU matters. Memory matters. State matters. Idle time matters. Background work matters. So does whether the platform makes you split the app in weird ways just to get around its limits.
The hard part is not finding the cheapest number. It is knowing what the number is measuring before the bill shows up.
CPU Is Not The Whole Computer
Active CPU pricing can be great. If your workload wakes up, does a little work, and goes back to sleep, paying only for the active CPU part can make a lot of sense. A lot of web apps really do spend most of their lives waiting around.
That is the clean version of the argument.
The messier version is that CPU is not the whole computer. Sometimes the thing you care about is memory staying provisioned. Or files staying where they were. Or a database living nearby. Or an open socket. Or a runtime with packages installed because an agent just spent ten minutes setting up its little workbench and it would be annoying if all of that vanished.
This is where AI apps make pricing weird. Many of them are bursty in CPU but stateful everywhere else.
The agent may only think hard for a few seconds. The rest of the time it is waiting on a user, holding a session, keeping context, writing logs, storing files, or sitting on state it will need later. If the pricing model only makes the CPU part obvious, the developer still has to reverse-engineer the rest.
That is not automatically bad. It is just not simple.
The Demo Is Usually A Request
A demo usually looks like a request. A product is usually a small system wearing a request costume.
A demo chatbot can be one clean round trip: send prompt, receive answer, render text. That is why AI demos are so easy to make. The first version barely has to remember anything. It just has to look alive for thirty seconds.
RecallMEM stopped being that almost immediately.
After a response, the app may need to extract facts, reject unsupported memories, update a profile, embed transcript chunks, store usage data, and make sure the next conversation can actually see what the previous one learned. If that work happens in the request path, the user waits. If it happens later, the app might not remember fast enough. If it never happens, the product is lying.
Voice makes this even more annoying. A voice agent is not one API call. It is a live session with audio coming in, audio going out, memory lookups in the middle, interruptions, retries, and the constant threat of awkward silence. If a memory search takes too long, the user does not think, "ah, a background tool call is running." They think the thing broke.
That is the difference between a demo and a product.
The pricing model has to survive the product.
Surprise Is The Bad UX
Most developers do not compare cloud platforms by list price. They compare them by surprise.
Heroku surprise is: why did this become so expensive?
Vercel surprise is: why are there all these little metered things on this bill?
Traditional VPS surprise is: wait, I own all of this now?
Fly's surprise is different: okay, I guess I need to understand machines, memory, regions, volumes, and usage.
That last one is real. Explicit primitives are not free. If a platform gives you machines, it is asking you to think about machines. Sometimes you do not want that. Sometimes you just want to ship the thing and go be happy for once.
But this is the part I like about Fly's model: the complexity is visible early. You can build a mental model before the bill shows up.
You can look at the machine size. You can look at the memory. You can look at the volume. You can understand what stays running, what sleeps, where the database is, and what happens when the app sits there doing almost nothing.
That does not automatically make it cheaper.
It makes it legible.
Platform Constraints Become Product Constraints
This is the part that matters more than the price screenshot.
Platform constraints eventually become product constraints.
If background work is awkward, your product quietly avoids background work. If sockets are fragile or expensive, your voice feature gets worse. If persistent storage feels bolted on, you avoid features that need files. If the bill is impossible to predict, you design from fear instead of user experience.
I felt this when thinking about hosted RecallMEM. Localhost was hiding everything. My machine already had Postgres, pgvector, Ollama, model files, environment variables, and months of accidental setup. The app felt simple because my computer was doing a lot of unpaid emotional labor.
Then hosted mode asked the rude questions.
Where does Postgres live? How do migrations run? What happens to Ollama? What happens to PDFs? What happens when a voice session needs memory? What happens when the app cold starts and the user is already in the middle of a chat?
That is not just deployment. That is product design with an invoice attached.
Annoying, but useful.
The Bill Should Be Boring
The hard part with infrastructure pricing is not finding the cheapest possible screenshot. The hard part is knowing what you are actually paying for before the app becomes real.
If active CPU pricing fits the workload, use it. If machines fit the workload, use them. If a managed platform saves enough engineering time to justify the price, that can be the right answer too.
The bad outcome is not paying money. Apps cost money. The bad outcome is not knowing which behavior caused the bill. That is when pricing becomes bad UX. The platform had a secret, and the invoice was the reveal.
I do not think developers need every platform to be the cheapest. They need the tradeoffs to be understandable. Tell me what is metered. Tell me what stays warm. Tell me what sleeps. Tell me what happens to memory and files. Tell me what happens when my agent sits there for twenty minutes doing almost no CPU work and then suddenly does something expensive.
That is the kind of pricing I trust. Not because it always gives me the lowest number, but because when the bill shows up, I want it to feel boring.