On September 15, 2021, at 4:00am EDT, the Cortext team upgraded the production servers to tune database and directory performance. Our initial monitoring of the metrics during the first few hours after the upgrade showed expected improvements.
At 10:45am, EDT, the team observed a sudden database load increase. This was followed by gradual performance degradation impacting login and messaging delivery for all customers by 11:15am EDT. Our initial mitigation steps, including a rolling restart of servers, did not improve the situation. The team then used a contingency plan to roll back the morning’s update. The team determined that the message delivery and login was fully restored around 12:45 EDT. The outage lasted approximately 90 minutes.