The Railway CDN Incident Is a Wake-Up Call About Trust Limits in PaaS

Yesterday, Railway published one of the most uncomfortable incident reports I’ve read in a long time. Between 10:42 and 11:34 UTC on March 30, a configuration update to their CDN accidentally enabled caching for approximately 0.05% of domains that had explicitly opted not to use Railway’s CDN feature. For 52 minutes, HTTP GET responses — including authenticated ones — were stored at the edge and potentially served to users other than the original requester. Some users saw pages intended for other people. Railway purged the cache globally and published a complete postmortem the same day.

The root cause was technically specific: a Railway engineer enabled “Surrogate Keys” on their CDN provider to enable cache invalidation by domain. That feature, unknown to anyone, completely bypassed the behavior of if CDN is disabled, pass to origin. The Set-Cookie headers were not cached — but the response bodies were. And the body is where the damage is.

That distinction deserves its own paragraph.


The Cookie Trap Is Not the Only Trap

There is a very widespread mental model among developers that says: “if I’m not caching cookies, my sessions are safe.” The Railway incident cleanly breaks that model. Your session cookie can be perfectly isolated, but the authenticated HTTP response — the one containing your username, billing status, workspace identifiers, CSRF tokens embedded in forms, or the JSON that initializes the frontend — that response body can still be cached and served to the wrong person. You can protect the cookie and still leak the page.

The standard protection is explicit Cache-Control headers on any response containing user-specific content: Cache-Control: no-store, no-cache, private. If your application doesn’t emit these headers on authenticated routes, you’re implicitly trusting your infrastructure provider’s default behavior — and that default behavior can change, be misconfigured, or fail exactly this way.

This is not a hypothetical scenario. It just happened.


The PaaS Convenience Layer Has Its Own Attack Surface

I want to be precise here: this is not a condemnation of Railway. They published a transparent postmortem, notified affected users, purged the cache quickly, and outlined concrete preventive measures — additional cache behavior testing before production changes, and staggered CDN rollouts over hours instead of minutes. That’s the right posture in the face of an incident.

The broader point is architectural. PaaS platforms abstract enormous complexity: DNS, TLS, routing, edge delivery, DDoS mitigation. That abstraction has genuine value. But it also means that a configuration change within a provider layer you don’t control — one you might not even know existed — can alter your application’s security properties without your application changing at all.

Railway had a guard (CDN disabled → pass to origin). A legitimate engineering change bypassed it. Applications deployed on those affected domains had no visibility into this. No code changed on the application side. The behavior changed at the infrastructure layer.


What This Means for Your Stack

Three practical lessons from this incident:

1. Set explicit cache headers on every authenticated route. Don’t rely on platform defaults. Cache-Control: no-store on any response containing session-scoped content is not optional — it’s the bare minimum. If your framework or middleware doesn’t set it for authenticated responses, add it now.

2. Treat your PaaS provider’s CDN layer as a security boundary you need to audit. Know which features are opt-in versus opt-out. Know what the default cache behavior is when there are no explicit headers. Know what an accidental configuration change would look like and whether you’d detect it.

3. Subscribe to your provider’s status page and postmortems. Not to panic — but because this is the layer of your stack where changes happen outside your git history. Railway’s postmortem was clear and detailed. Others’ may not be.


The Hacker News thread on this incident contains pointed comments, including users reporting that medical data was exposed in their applications during the window. Whether those reports are verified or not, the direction of the risk is clear: authenticated content cached at the edge is a personal data problem, a compliance problem, and in some verticals, a regulatory problem.

PaaS platforms are not magical security layers. They’re infrastructure. Treat them as such.


Source: Railway Incident Report — March 30, 2026