ElevenLabs

Help CentreSubscribe to updates
Powered by
Privacy policy

·

Terms of service
Write-up
EU Residency - Issue with the Agents platform, TTS and Scribe in the EU residency server
Partial outage
View the incident
RCA: EU Data Residency TTS/ASR/Agents Crashing

Duration:

May 15, 2026 8:59 AM UTC - May 15, 2026 9:42 AM UTC


Description of the Issue:

Calls to our APIs using Text-To-Speech models or Scribe models were failing with a 500 status code. Calls through the agents platform which is a downstream user of these models were also failing.


Root Cause:

The pods for the proxy sitting in front of our AI models had run out of memory and required vertical and horizontal scaling because of increased Scribe demand. This also degraded performance for TTS requests which were using the same proxy.


Timeline & Actions Taken:
May 15, 2026 9:02 AM UTC - Automatic cloud alerts paged our engineering team. Investigation begins right away.

May 15, 2026 9:10 AM UTC - The response team declares a P1 incident. 

May 15, 2026 9:12 AM UTC - The support team updates the status page.

May 15, 2026 9:13 AM UTC - The response team identifies the faulty service. The proxy sitting in front of our AI models is running out of memory. 

May 15, 2026 9:19 AM UTC - The proxy service is now scaled up both vertically and horizontally to meet the demand.

May 15, 2026 9:24 AM UTC - Request success rates are already recovering from the previous scale-up.

May 15, 2026 9:25 AM UTC - The Scribe team had already been rolling out a new feature to lower memory usage for the proxy but had not reached the EU Data Residency yet. The feature is getting rolled out to the EU Data Residency cluster.

May 15, 2026 9:28 AM UTC - The issue is resolved. The response team keeps monitoring the situation.

May 15, 2026 9:42 AM UTC - The incident is marked as resolved.


Preventative Actions & Learnings:

The memory allocated for each proxy was inconsistent across our environments.

Even though an improvement for the proxy memory usage existed, the team should have shortened the roll-out period to all our clusters. 

We will continue our ongoing effort to centralize configurations so that customers always receive the same product experience across all of our deployment environments.