Write-up published
Resolved
As of 7:45 PM UTC, all affected services have fully recovered and are operating normally.
Agents Platform and Audio Platform (Website) have returned to full health. Text-to-Speech (TTS), Speech-to-Text (STT), and Dubbing APIs were unaffected throughout the incident.
The root cause has been identified and remediated. We will be conducting a thorough post-incident review and will share further details in a follow-up report.
We apologize for the disruption and appreciate your patience throughout this incident. If you are still experiencing any issues, please don't hesitate to contact our support team.
Monitoring
Affected services continue to show sustained improvement following the changes implemented earlier. We are continuing to monitor key metrics and will confirm full resolution once we are satisfied with stability across baseline error rates.
Next update in 15-20 minutes.
Monitoring
We have implemented changes to the affected services that have resulted in signs of recovery and improved performance. We are closely monitoring to confirm stability and will provide an another update in 15–20 minutes.
Identified
Our team remains fully engaged and continues to work through active lines of investigation. All previously noted affected and operational components remain in the same state.
We will provide another update in 15-20 minutes
Identified
We are continuing to investigate all possible root causes and eliminate potential causes. Additionally we are clarifying affected products below.
What's affected:
Agents Platform - degraded performance due to issues around conversation initiation. Conversations that do start function as normal thus API customers should retry failed initiation conversations.
Website - intermittent slowness
What's working normally:
Text-to-Speech (TTS) API
Speech-to-Text (STT) API
Dubbing API
Our engineering team remains fully engaged and is continuing to work through active lines of investigation alongside our infrastructure provider. We are iterating on potential mitigations and monitoring the results closely.
We will continue to provide updates every 15-20 minutes.
Identified
Our team continues to actively investigate the degraded performance affecting services. We are working through several lines of investigation in parallel and methodically testing potential contributing factors.
This remains escalated with our infrastructure provider and internal leadership, with additional engineering resources dedicated to this effort. While a definitive root cause has not yet been identified, each step is helping us build a clearer picture of the underlying issue.
We understand the impact this is having and are treating this with the highest priority. We will provide another update in 15-20 minutes.
Identified
Our engineering team is actively exploring multiple remediation paths in parallel while working to isolate the root cause. We have escalated this issue both to our infrastructure provider and to internal leadership to ensure it receives the highest level of attention and resources.
A definitive root cause has not yet been identified, but we are making iterative changes and closely monitoring their impact. We will continue to provide updates as our investigation progresses.
Identified
Our engineering team is continuing to investigate issues with the cloud provider components. Multiple engineering teams and leadership is involved in the efforts. We'll continue monitoring closely and provide timely updates as the situation evolves.
Identified
We have isolated the components responsible for performance degradation and are attempting to implement workarounds to bypass the affected cloud provider services.
Identified
We have identified an underlying issue with our cloud provider that is impacting service availability.
Investigating
We are currently investigating an issue with website and API requests and are working to identify the root cause as quickly as possible. We will provide an update as soon as more information becomes available.
TTS API request and Agents conversations are working. There may be issues related to API services that affect conversation initiation.