
The prolonged outage offered a reminder of the fragility of global internet connectivity and Amazon’s role underpinning much of online infrastructure.
By Robert McMillan, Belle Lin, and Sean McLain
A glitch with an obscure Amazon AMZN 1.61%increase; green up pointing triangle database disrupted life for millions of people across the U.S. as core internet services failed to function for an array of companies.
Alexa devices couldn’t hear. Corporate Slack messages wouldn’t post. Students couldn’t turn in assignments or access materials from courses. Financial trades were impossible on certain platforms. Users of Zoom, Venmo, Instacart and a host of other services faced prolonged outages that rippled through homes and businesses.
The trouble started a few hours after midnight on the East Coast.
A minor update to what’s called the Domain Name System—the kind of software tweak that happens millions of times a day on the internet—sent the well-oiled machine that underpins the modern web careening toward a crash.
DNS acts as a kind of telephone directory for the internet, instructing machines on how to find each other. The faulty update gave the wrong information for DynamoDB, an Amazon Web Services product that has become one of the world’s most important databases.
Suddenly, machines on the East Coast that tried to process trillions of requests were getting the internet’s equivalent of a wrong number.
Amazon services were some of the first to feel the effects.
At around 2 a.m. on Monday morning, the systems that help Amazon sort packages onto trucks and guide drivers on the road went down, according to an internal message viewed by The Wall Street Journal.
By 3 a.m., the outage’s blast radius had spread far beyond Amazon, cascading across the internet, delaying more than 4,000 flights, knocking out news websites such as The Wall Street Journal, affecting financial transactions and extending into everyday life.
The episode, which turned into one of the most prolonged daily outages for Amazon Web Services, offered a reminder of the fragility of global connectivity, which has gone dark a number of times in recent years after seemingly minor software updates. By late afternoon Monday, Amazon said it had restored much of the service that had been knocked offline.
“Even if just briefly, major providers like AWS going down represent vulnerabilities in what have become critical infrastructure for organizations and, in some cases, governments globally,” said Jacob Bourne, an analyst at research firm eMarketer. “As cloud reliance and workloads expand, these outages could hit industries harder.”
Another outage for AWS in 2021 similarly disrupted business and personal activities across the economy for most of a day. Amazon controls about a third of the public cloud-computing market, the core infrastructure of the modern internet.
An hour into the outage, Amazon seemed optimistic. “We continue to observe recovery across most of the affected AWS Services,” the company wrote. A half-hour later, at around 3:30 a.m., it acknowledged that customers weren’t out of the woods. Customers were having problems starting up new cloud-computing servers—the backbone of AWS’s operations.
The problems with DynamoDB were cascading. Soon, they would render AWS virtually unusable for customers dependent on Amazon’s East Coast data centers. By the end of the day, 142 AWS products would be affected, including the software the company uses to spread computing demand over its network of data centers, and the content-delivery software it uses to display websites to internet browsers.
So many East Coast services were affected that customers who wanted to simply pull up and move their computing operations to another part of Amazon’s cloud found themselves unable to migrate.
At 4 a.m. Monday, the cryptocurrency exchange Coinbase told customers that it was aware that many of them couldn’t get into their accounts because of the Amazon outage and was working to get them access.
Some customers said that the outage left them unable to close their trades. “A crypto exchange brought down by a centralized cloud. You can’t preach decentralization and still depend on AWS to stay online,” wrote one.
Around the same time, the trading platform Robinhood also said the AWS outage had knocked it offline. On X, the company said issues persisted through the day on Monday but were ultimately resolved.
While it is too soon to have a complete understanding of the cost of the outage, other companies have faced multimillion-dollar claims and lawsuits when their services were at the root of global disruptions.
Last year, a software patch from CrowdStrike knocked nearly 10 million computers offline, affecting hospitals, restaurants, media companies and beyond. Although the acute impact from that event was related to laptops using Microsoft software that were rendered unusable, the cloud also played a role.
Digital-disruption insurance provider Parametrix said the CrowdStrike event caused $5.4 billion in losses for the Fortune 500, excluding Microsoft.
When Will Mauldin tried to print shipping labels for the weekend orders at his woodworking company in Rockville, Md., on Monday, he couldn’t log in to the fulfillment system run by a third-party software provider. Later in the day, Mauldin discovered that he couldn’t send coupons to new customers, because of downtime at another vendor.
“We’re not getting our orders today and we’re not getting anything out,” he said. “I had no idea that the loss of one web cloud service would chip away at my small business and give me a Monday morning from hell.”
Carlos Naudon, chief executive of Ponce Bank, said he couldn’t charge his electric vehicle on Monday morning, as the charging network uses AWS. Ponce, a New York-based bank, doesn’t rely on the cloud for core banking services, but the outage delayed transactions and cost an estimated $50,000 to $100,000, he said.
“Honestly, it’s more unsettling than anything else,” Naudon said. “You can’t assume that something like that is going to be foolproof and nothing can happen.”
The Amazon Web Services outage is poised to accelerate a push among many executives to diversify their cloud services, analysts said. Doing so can make it easier to respond and adjust to disruptions, but software executives cautioned that problems can occur on other cloud platforms as well.
Some companies that have diversified managed to avoid the chaos others faced throughout the day.
Shaun Hunt, chief information officer of McKenney’s, said the Atlanta-based construction services firm didn’t go down completely because the firm uses several cloud platforms and runs on its own data centers. “We have a strategy to diversify our risk,” he said.
Over the past decade, DynamoDB has come to not only power Amazon’s retail operation, but also, increasingly, store the data used by AWS computing services themselves, according to Adrian Cockcroft, a former AWS executive who is now a technology adviser.
“But if you can’t reach it because the DNS is down, then all sorts of things are not working,” he said. “It’s like you’re trying to visit a website and it’s not there.”
DNS-related outages have taken companies and the internet offline in recent years, including Facebook in 2021 and a large chunk of the internet in 2016.
Write to Robert McMillan at robert.mcmillan@wsj.com, Belle Lin at belle.lin@wsj.com and Sean McLain at sean.mclain@wsj.com