Railway took staging and prod down for ~6 hours — weighing a provider switch
2026-05-20
Root cause, per Railway's own post-mortem: Google Cloud's automated enforcement suspended Railway's GCP account around 22:20 UTC on May 19. Railway runs its own hardware across 8 sites in 4 regions, but kept its API and database on GCP because they considered that workload low-risk. When GCP pulled the plug, the API went with it — and because the API is the central routing dependency for edge locations, routing tables couldn't refresh and every workload (metal, GCP, AWS, enterprise, non-enterprise) became unreachable. The metal hardware wasn't broken; it just couldn't be reached.
Railway is now redistributing routing tables across regions so a single cloud-provider action can't take everything down again. That's the right fix, but it's a future fix.
From here, the provider question isn't really "is Railway reliable" — it's that any host's control plane is a single point of failure if it lives on infrastructure outside their direct control. Fly.io, Render, and a small self-hosted box are all on the table. The next few days are about reading the full post-mortem, pricing a migration honestly, and figuring out what uptime October's Steam Next Fest window can actually tolerate.