[Quickbase US Status] Notice: Service Functionality Degraded (Unplanned) - Down Time

Incident
07/31/2024, 07:30pm EDT

[Quickbase US Status] Notice: Service Functionality Degraded (Unplanned) - Down Time

Status: closed
Start: 05/23/2024, 02:26pm EDT
End: 05/23/2024, 02:34pm EDT
Duration: 8 minutes
Affected Components:
Quickbase Service - US Region Quickbase Audit Logs - US Region Quickbase Automations - US Region Quickbase Billing - US Region Quickbase Pipelines - US Region Quickbase Platform Analytics - US Region Quickbase RESTful APIs - US Region Quickbase Sync - US Region Quickbase Webhooks - US Region
Update

05/23/2024, 02:26pm EDT

05/23/2024, 02:26pm EDT

We are currently investigating performance issues across the platform.  We will provide an update as soon as we have one.

Resolved

05/23/2024, 02:34pm EDT

05/23/2024, 02:34pm EDT

As of 2:34 PM Eastern US Time, the platform is working properly again.  We will provide a root cause once we have it.

This incident is closed.

Root Cause

07/31/2024, 07:30pm EDT

07/31/2024, 07:30pm EDT

A SQL procedure was taking longer to respond than normal and caused blocking of queries to the SQL database supporting the Quickbase platform.  A heavily used SQL table that stores the status of cached information about customer realms (e.g., customername.quickbase.com) experienced this blocking.  There were two periods where this blocking occurred.  First, between 2:26 PM and 2:34 PM Eastern US Time (8 Minutes), and then between 4:00 PM and 4:20 PM Eastern US Time (20 Minutes).  During the 28 minutes in which this blocking occurred, some requests made to the Quickbase platform were either slower than normal, or timed out, or returned an error.  The SQL procedure that reads information from this heavily used SQL table was improperly locking access to the table and when there was a short burst of higher than normal requests, access to the table was blocked and other requests to the SQL database began to queue causing a cascading backup.

Our first response, during the first instance of the blocking between 2:26 PM and 2:34 PM Eastern US Time, was to regulate the abnormally high traffic to allow the SQL database to recover.  The high traffic that triggered the problem was neither malicious nor inappropriate.  It was just higher than the SQL database had handled previously.  After the first instance subsided, we continued investigating why the SQL database could not handle the traffic but had not yet come to a definitive conclusion when the second instance occurred between 4:00 PM and 4:20 PM Eastern US Time.  To mitigate the impact of the second instance, we once again regulated the abnormally high traffic.  During the second instance, we implemented a change to the SQL procedure that stopped it from locking access to the heavily used SQL table and the issue was resolved.

We've made the SQL procedure change permanent to prevent SQL blocking of this specific type in the future.  We've also added internal rate limiting to prevent certain types of traffic from negatively impacting the SQL database, and added enhanced monitoring to improve our ability to recognize this type of problem more quickly in the future.

We realize we are an important part of your business and you expect Quickbase to be available and performing well when you need it.  We apologize for the disruption this incident caused our customers.