Issue Loading Verkada Command

Incident Report for Verkada Command

Postmortem

Incident Summary
On November 6th at 7:00 PM PST, our automated monitoring system detected a service degradation impacting multiple services, leading us to declare an internal incident. During this period, users encountered instability in the Command application and API, with some unable to access the Command application at all.

Root Cause
The incident was caused by a surge in load on a permissions database. This spike originated from a suboptimal database operation within our regular release process, which led to excessive lock contention on the database. As a result, database queries and numerous internal requests were blocked or delayed, causing instability across the Command application and API.

Resolution and Mitigations
We quickly identified the source of the elevated database load and have since disabled the responsible operation. To prevent recurrence, we are:

Optimizing and refining the specific database operations to improve efficiency.
Implementing additional scope restrictions on these operations to minimize database lock contention during high-load scenarios.

Next Steps
We are conducting a broader review of our database operations in the release process to ensure stability under all conditions. Additionally, further adjustments to monitoring thresholds and alerts are underway to enhance early detection and prevent similar issues from impacting our users in the future.

Posted Nov 07, 2024 - 09:45 PST

Resolved

We have resolved an issue causing some users to be unable to load Verkada Command. The erroneous release has been reverted.

Posted Nov 06, 2024 - 07:30 PST