Why the AWS outage took 16 hours, not 1, Density Labs

October 20, 2025. AWS went dark for 16 hours.

A DNS resolution issue. The kind of problem AWS engineers have seen and fixed in under an hour, dozens of times across the years. Snapchat down. Fortnite offline. Banking apps unreachable. Smart doorbells dumb again. Even ChatGPT silent. Across 2,000 businesses, billions in lost productivity.

The official explanation is a DNS issue. The unofficial one is more interesting.

Months earlier, Amazon laid off hundreds of AWS engineers as part of a broader restructure to “do more with AI.” The cloud computing unit, the team responsible for the systems the rest of the internet runs on, lost a meaningful slice of its experienced staff.

Corey Quinn, a cloud computing analyst, put it cleanly:

“You can hire brilliant people who understand DNS at a technical level. What you can’t easily replace is the person who remembers that when DNS starts acting weird, you need to check that seemingly unrelated system in the corner, because it’s caused problems before.”

That is tribal knowledge. The institutional memory that comes from being in the room when things broke, repeatedly, for years.

You cannot buy it. You cannot transfer it through documentation. You cannot synthesize it with an LLM trained on Stack Overflow. It is the layered context an engineer accumulates by being in the system long enough to remember the weird thing that happened in 2019 that led to the workaround that is still load bearing in 2025.

Density compounds. So does its absence.

The four densities, and what Amazon lost

We use an internal framework for what makes engineering partnerships actually work. We call it operational density. Four dimensions that accumulate over time and cannot be shortcut.

Context density. How much of the system’s domain, codebase, and culture lives inside the engineer’s working knowledge. This is what Quinn was describing. AWS engineers who left were not interchangeable with the new ones. The new ones were brilliant. They had not yet seen DNS behave the way it behaves at AWS scale, when this specific service interacts with that specific load balancer config.

Trust density. How many small bets the engineer and the team have run together that paid off. When something goes wrong, who do you call first? In an outage, the right answer is the person we trust, not the person on call by the rotation. Lay off a tenured engineer and you do not just lose their skills. You lose every team’s first phone call.

Cadence density. How tight the feedback loops between teams have become. Senior engineers do not escalate in formal channels. They DM, they grab a Zoom, they get the right people in the room in two minutes. New engineers escalate by ticket and wait. In a 16 hour outage, those minutes compound.

Stake density. How much each side has invested beyond the contract. Engineers who feel ownership of the system show up differently than engineers who feel like cost centers. After a layoff round, the remaining engineers know which side of that line their employer thinks they sit on.

Why mid market AI buyers should pay attention

This was Amazon. Unlimited budget, unlimited talent, a decade of tooling. They still lost 16 hours because the four densities had been hollowed out.

If your team is building anything in production, the same dynamics apply. Your seniors have context that did not get written down. Your incidents get resolved fast because someone remembers. Your AI initiative will land or stall not on the model you pick, but on whether the people running it have been in the system long enough to know the unwritten conventions.

The mistake most companies are making in 2026 is treating AI as a substitute for tenure. It is not. AI is a multiplier on tenure. Engineers with deep context plus AI ship faster. Engineers without context plus AI ship features that break in production at week 8.

What we do differently

This is the entire reason we structure engagements around tenure, not utilization.

When we place an engineer, we are not selling you hours. We are selling you the four densities, accumulating over years. Our longest engagement at Ooma is now in its tenth year. The engineer who started in 2016 is not 4x more productive than the one who started in 2024. They are 10x or 50x more productive, because all four densities have compounded.

The 16 hour AWS outage is the case study for what happens when you optimize for cost over density. We optimize for density. That is the entire difference.

If you are building AI that has to ship to production this quarter, you do not want bodies. You want engineers who have been in your codebase long enough to know which DNS quirk is going to bite you in week 8.

Start with the AI Roadmap →

Or read more about how we work: The Density Method.