On Tuesday, December 7, 2021, we took all Planning Center applications offline in response to a multi-service outage in Amazon Web Services (AWS) Northern Virginia region.
We know this outage caused a big disruption to your day and we don't take bringing down any of our products lightly, let alone the entire suite. We want to provide you with more context about what happened and why.
When you connect to a Planning Center product from your computer, phone, or tablet, you're connecting to our systems in AWS data centers in Northern Virginia. Our products span multiple data centers within that area to ensure that issues in a single data center do not cause a service disruption. Unfortunately, AWS issues last Tuesday were region-wide and lasted several hours, which disrupted all of our data centers.
We use AWS's Simple Notification Service (SNS) as the plumbing for internal messaging between products and send hundreds of requests per second to the service. At around 7:30 AM Pacific, over 90% of our requests to SNS began to fail. When an event was updated in Registrations, Check-Ins wasn't getting the memo. When events were made in Registrations and Groups, Calendar had no idea about the party. This caused concern for data consistency issues where one product would never receive data from the other, so at 8:15 AM Pacific, we took all products offline.
After going offline, our inability to interact with AWS services in the region extended to several other services that we rely on and despite our persistent attempts, we weren't able to bring our applications back online until 3:15pm Pacific, followed by a full recovery at 5:00pm Pacific when all data was verified to be back in sync.
Know that we'll do everything we can to learn from this, improve our resiliency, and continue building reliable and highly-available products.