ElevenLabs Audio & Video Integration

Connect ElevenLabs to give your AI agents powerful audio and video capabilities including text-to-speech, transcription, sound effects, music generation, dubbing, voice transformation, and audio cleanup.

Overview

The ElevenLabs Audio & Video Integration provides AI agents with 7 tools for generating and processing audio and video content. Agents can convert text to natural-sounding speech, transcribe recordings with speaker identification, generate sound effects and music from descriptions, dub content into 32+ languages, transform voices, and remove background noise from recordings.

All generated files are stored securely in Azure Blob Storage and returned to agents as time-limited, read-only URLs. The integration connects via a single API key from your ElevenLabs account, and all 7 tools are bundled into an "ElevenLabs Audio & Video" tool group ready to assign to any agent.

info

Voice cloning is intentionally excluded from this integration. While ElevenLabs supports voice cloning through their API, this capability is not exposed to agents due to consent and legal considerations around cloning voices without explicit authorization.

Use Cases

Voiceover Production - Generate narration for product announcements, training materials, or presentations using natural-sounding AI voices
Meeting Transcription - Convert meeting recordings to searchable text with speaker identification and timestamps
Content Localization - Dub training videos, marketing content, or customer communications into multiple languages while preserving speaker voice and emotion
Sound Design - Create notification sounds, background music, or sound effects for apps and presentations from natural language descriptions
Audio Cleanup - Remove background noise from recordings before sharing or archiving
Podcast Production - Transcribe episodes, generate intro/outro music, and clean up audio quality

How It Works

Admin connects ElevenLabs    7 tools created                 Agents use tools
via API key                  in tool group                   during execution
       |                            |                              |
       v                            v                              v
+-----------------+       +---------------------+       +---------------------+
| Enter API key   |       | text_to_speech      |       | Generate speech,    |
| from ElevenLabs |  -->  | transcribe          |  -->  | transcribe audio,   |
| account         |       | sound_effects       |       | create SFX/music,   |
|                 |       | generate_music      |       | dub content, clean  |
|                 |       | dub, voice_changer  |       | up recordings       |
|                 |       | audio_isolation     |       |                     |
+-----------------+       +---------------------+       +---------------------+
                                                               |
                                                               v
                                                     +---------------------+
                                                     | Files stored in     |
                                                     | Azure Blob Storage  |
                                                     | with SAS URL access |
                                                     +---------------------+

Getting Started

Prerequisites

Before connecting ElevenLabs:

Pro Plus+ Subscription - The ElevenLabs integration requires the custom.elevenlabs feature code on your subscription
ElevenLabs Account - You need an active ElevenLabs account with a paid plan (Creator or higher recommended for sufficient character quota)
ElevenLabs API Key - Generate an API key at elevenlabs.io/app/settings/api-keys
Control Bridge Admin Access - You must be a Control Bridge administrator to configure the integration

Step 1: Connect ElevenLabs

Navigate to Build > Connections > ElevenLabs Audio & Video
Click Connect ElevenLabs
Enter your ElevenLabs API key
The system validates the key against the ElevenLabs API and retrieves your account details
Upon successful validation, 7 tools are created and bundled into an "ElevenLabs Audio & Video" tool group

tip

You can find your API key in the ElevenLabs dashboard under Profile Settings > API Keys. Keep your API key secure - it provides full access to your ElevenLabs account's capabilities and character quota.

Step 2: Verify the Connection

After connecting:

The connection page shows your account status, subscription tier, and character usage
Click Test Connection to verify the integration is working
The system confirms the API key is valid and refreshes your character count and limit

Step 3: Assign the Tool Group to Agents

ElevenLabs tools are bundled into an "ElevenLabs Audio & Video" tool group automatically created at connection time:

Navigate to Build > AI Agents > Agents
Edit the agent that should have audio/video capabilities
Go to the Tools tab
In the Tool Groups section, enable the ElevenLabs Audio & Video group
Save the agent

All 7 ElevenLabs tools are assigned together as a single unit via the tool group.

warning

Only assign the ElevenLabs tool group to agents that genuinely need audio/video capabilities. ElevenLabs usage consumes characters from your account quota, and generated content uses Azure Blob Storage. Monitor your character usage on the connection page.

Available Tools

When ElevenLabs is connected, 7 tools are created and grouped under the "ElevenLabs Audio & Video" tool group.

1. Text to Speech (`elevenlabs_text_to_speech`)

Convert text to natural-sounding speech audio with voice selection and model quality options.

Parameter	Type	Required	Default	Description
text	string	Yes	-	The text to convert to speech (max 5,000 characters)
voice	string	No	`Rachel`	Voice name or ID. Available voices: Rachel, Domi, Bella, Antoni, Elli, Josh, Arnold, Adam, Sam. Use a voice ID for custom voices.
model	string	No	`multilingual_v2`	TTS model: `flash` (fastest, ~75ms), `turbo` (balanced), `multilingual_v2` (high quality), `v3` (emotionally rich)
language	string	No	Auto-detect	ISO 639-1 language code (e.g., `en`, `es`, `fr`)
output_format	string	No	`mp3`	Audio format: `mp3` (universal playback), `opus` (smaller file size), `pcm` (raw audio data)
stability	number	No	`0.5`	Voice stability (0-1). Lower = more expressive, higher = more consistent
similarity	number	No	`0.75`	Voice similarity boost (0-1). Higher = closer to original voice

Returns a URL to the generated audio file along with the format, voice, model, and character count used.

Example queries agents can answer:

"Read this announcement aloud" - Converts text to speech with default settings
"Create a Spanish voiceover for this script" - Uses language='es' with multilingual model
"Generate a quick audio preview" - Uses model='flash' for fastest generation

2. Transcribe (`elevenlabs_transcribe`)

Convert audio or video files to text transcripts with speaker diarization and timestamp options.

Parameter	Type	Required	Default	Description
file_url	string	Yes	-	URL to the audio/video file (HTTPS). Supports MP3, WAV, M4A, MP4, WebM. Max 50MB.
language	string	No	Auto-detect	ISO 639-1 language code for optimization
speakers	integer	No	Auto-detect	Expected number of speakers for diarization (1-32)
timestamps	string	No	`word`	Timestamp granularity: `none`, `word`, or `character`
format	string	No	`text`	Output format: `text` (plain transcript), `srt` (subtitles), `json` (structured with timestamps)

Returns the transcript text, detected language, audio duration, and number of speakers detected. Transcripts longer than 10,000 characters are truncated with a notice.

Output formats:

text - Plain text transcript, suitable for most use cases
srt - SRT subtitle format with timing markers, useful for video captioning
json - Structured output with word-level timestamps and speaker labels

3. Sound Effects (`elevenlabs_sound_effects`)

Generate sound effects from natural language descriptions.

Parameter	Type	Required	Default	Description
description	string	Yes	-	Text description of the desired sound (e.g., "thunder rumbling in the distance")
duration	number	No	Auto	Duration in seconds (0.5-30). Omit for optimal auto-determined length.
loop	boolean	No	`false`	Create a smoothly looping sound effect
prompt_influence	number	No	`0.3`	How closely to follow the description (0-1). Lower = more creative, higher = more literal.

Returns a URL to the generated audio file.

Example descriptions:

"Gentle notification chime"
"Thunder rumbling in the distance"
"Keyboard typing sounds"
"Ocean waves crashing on rocks"

4. Generate Music (`elevenlabs_generate_music`)

Create background music and compositions from text descriptions.

Parameter	Type	Required	Default	Description
description	string	Yes	-	Text description of the desired music (e.g., "upbeat corporate background music with light piano")
duration	number	No	`30`	Duration in seconds (5-300)
instrumental	boolean	No	`true`	Instrumental only, no vocals

Returns a URL to the generated audio file.

Example descriptions:

"Upbeat corporate piano with subtle drums"
"Calm ambient lo-fi beat for studying"
"Dramatic orchestral intro for a presentation"
"Relaxing acoustic guitar background"

5. Dub (`elevenlabs_dub`)

Translate audio and video content into other languages while preserving speaker voice and emotion. This tool uses an asynchronous submit-and-poll pattern because dubbing can take several minutes to complete.

Parameter	Type	Required	Default	Description
file_url	string	For submit	-	URL to the audio/video file (HTTPS). Max 50MB, 2.5 hours.
target_language	string	For submit	-	ISO 639-1 language code for the dubbed output (e.g., `es`, `fr`, `de`)
source_language	string	No	`auto`	ISO 639-1 language code of the source audio
speakers	integer	No	`0`	Number of speakers (0 for auto-detection)
watermark	boolean	No	`false`	Apply watermark to reduce credit cost
dubbing_id	string	For poll	-	ID of an existing dubbing job to check status

Two-mode operation:

Submit mode - Provide file_url and target_language to start a new dubbing job. The tool returns a dubbing_id and a processing status.
Poll mode - Provide the dubbing_id to check progress. Returns one of three statuses:
- processing - Job is still running. Try again in 30-60 seconds.
- complete - Dubbing is finished. A URL to the dubbed file is included.
- failed - Job encountered an error. The error message is included.

Supported languages include: English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Swedish, Norwegian, Danish, Finnish, Turkish, Arabic, Hindi, Japanese, Korean, Chinese (Mandarin), and many more (32+ total).

tip

When configuring agent instructions for dubbing workflows, remind the agent to poll for completion using the dubbing_id. A typical dubbing job takes 1-5 minutes depending on file length.

6. Voice Changer (`elevenlabs_voice_changer`)

Transform the voice in an audio recording to a different voice while preserving the speech content.

Parameter	Type	Required	Default	Description
file_url	string	Yes	-	URL to the source audio file (HTTPS). Supports MP3, WAV, M4A. Max 50MB.
voice	string	No	`Rachel`	Target voice name or ID. Available voices: Rachel, Domi, Bella, Antoni, Elli, Josh, Arnold, Adam, Sam.
model	string	No	`multilingual_v2`	Model: `flash` or `multilingual_v2`
stability	number	No	`0.5`	Voice stability (0-1)
similarity	number	No	`0.75`	Voice similarity boost (0-1)

Returns a URL to the transformed audio file.

7. Audio Isolation (`elevenlabs_audio_isolation`)

Remove background noise from audio recordings, isolating clean speech.

Parameter	Type	Required	Default	Description
file_url	string	Yes	-	URL to the audio file to clean (HTTPS). Supports MP3, WAV, M4A. Max 50MB.

Returns a URL to the cleaned audio file with background noise removed.

Common use cases:

Clean up noisy meeting recordings before transcription
Improve audio quality of field recordings
Prepare audio for voiceover or podcast production

Security & Limitations

Security

API key encrypted at rest - Stored using AES-256-CBC encryption; never exposed in GET API responses
Tenant isolation - Each tenant has its own ElevenLabs connection and tools, scoped by TenantId
Secure file storage - Generated files stored in tenant-scoped Azure Blob Storage paths
Time-limited access - SAS URLs expire after 1 hour with read-only, HTTPS-only permissions
Automatic file cleanup - Generated files are automatically deleted after 7 days via Azure lifecycle policy
SSRF protection - File URL inputs are validated for HTTPS-only and private IP blocking
Internal users only - ElevenLabs tools are gated to internal users or above; external senders cannot trigger content generation
Audit logging - Every tool execution is logged with the agent, parameters, and results

Limitations

No voice cloning - Voice cloning is intentionally excluded due to consent and legal considerations
Single connection per tenant - Only one ElevenLabs account per Control Bridge tenant
File size limit - Input files (for transcription, dubbing, voice changing, audio isolation) are limited to 50MB
Text-to-speech cap - Maximum 5,000 characters per text-to-speech call
Transcript truncation - Transcription output is truncated at 10,000 characters in text mode (use JSON format for full output)
Dubbing length - Dubbing supports files up to 2.5 hours in length
Sound effects duration - Sound effects are limited to 0.5-30 seconds
Music duration - Music generation supports 5-300 seconds (5 minutes)
Character quota - All operations consume characters from your ElevenLabs account quota; monitor usage on the connection page

Rate Limits

ElevenLabs enforces rate limits based on your subscription tier. If an agent hits a rate limit, the tool automatically retries with exponential backoff up to 3 times. Retry-After values exceeding 60 seconds cause immediate failure to avoid long agent stalls.

Troubleshooting

API Key Validation Fails

Problem: The API key is rejected when connecting

Solutions:

Verify you copied the full API key from elevenlabs.io/app/settings/api-keys
Ensure your ElevenLabs account is active and has a paid subscription
Check that the API key has not been revoked or regenerated
Try generating a new API key from the ElevenLabs dashboard

Connection Test Fails After Connecting

Problem: The Test Connection button returns an error

Solutions:

Your API key may have been revoked - navigate to ElevenLabs dashboard and verify the key is still active
Click Update API Key to enter a new key if the original was rotated
Check your ElevenLabs subscription status - an expired plan will cause API failures

Agent Cannot Find ElevenLabs Tools

Problem: ElevenLabs tools do not appear when editing an agent

Solutions:

Verify the ElevenLabs connection is active at Build > Connections > ElevenLabs Audio & Video
Check that the "ElevenLabs Audio & Video" tool group exists at Build > AI Agents > Tool Groups
Assign the tool group (not individual tools) to the agent
Refresh the page and try again

Text-to-Speech Returns Error

Problem: Agent receives an error when generating speech

Solutions:

Check that the text is under 5,000 characters - split longer text into multiple calls
Verify the voice name is valid (use one of the premade voices or a valid custom voice ID)
Check your ElevenLabs character quota on the connection page - you may have exhausted your monthly limit

Dubbing Job Stuck in Processing

Problem: A dubbing job remains in "processing" status after multiple polls

Solutions:

Dubbing jobs for longer files can take several minutes - allow up to 10 minutes for large files
If the job does not complete after 15 minutes, it may have failed silently. Submit a new dubbing job.
Verify the source file URL is still accessible and has not expired

File URL Rejected

Problem: Agent receives "file_url is invalid" or a download error

Solutions:

Ensure the file URL uses HTTPS (HTTP URLs are rejected)
Verify the URL is publicly accessible or uses a valid SAS token
Check that the file size is under 50MB
Confirm the file format is supported (MP3, WAV, M4A, MP4, WebM)

Rate Limit Errors

Problem: Agent receives rate limit errors from ElevenLabs

Solutions:

The agent will automatically retry with backoff - wait a moment and the request should succeed
If errors persist, reduce the frequency of ElevenLabs operations by spacing out agent executions
Consider upgrading your ElevenLabs plan for higher rate limits
Avoid running multiple agents that use ElevenLabs tools simultaneously

Best Practices

Agent Instructions

Help your agents use ElevenLabs tools effectively:

When working with ElevenLabs audio and video tools:
1. For text-to-speech, keep text under 5,000 characters per call. Split
   longer content into multiple calls.
2. Choose the right TTS model: 'flash' for quick previews, 'multilingual_v2'
   for production quality, 'v3' for emotionally expressive speech.
3. For dubbing, always save the dubbing_id and poll for completion. Dubbing
   jobs typically take 1-5 minutes.
4. When transcribing, specify the expected number of speakers if known for
   better diarization accuracy.
5. For sound effects, be descriptive and specific (e.g., 'soft rain on a
   window' rather than just 'rain').
6. Consider running audio_isolation before transcription if the source
   recording has significant background noise.

Configuration

Assign the ElevenLabs tool group only to agents that genuinely need audio/video capabilities
Monitor character usage regularly on the connection page to avoid unexpected quota exhaustion
Consider creating a dedicated "Media Agent" with the ElevenLabs tool group rather than adding it to general-purpose agents
Test your agents with sample audio/video tasks after setup to verify they use the tools correctly

Security

Rotate your ElevenLabs API key periodically from the ElevenLabs dashboard, then update it on the connection page
Review agent execution logs to monitor what audio/video operations are being performed
Disconnect the ElevenLabs integration if your organization no longer needs audio/video capabilities
Generated files are automatically cleaned up after 7 days, but you can disconnect to stop new file creation immediately

Tools Overview - All available agent tools
Agents - Configure agents to use tools
Agent Executions - View tool execution logs

Overview​

Use Cases​

How It Works​

Getting Started​

Prerequisites​

Step 1: Connect ElevenLabs​

Step 2: Verify the Connection​

Step 3: Assign the Tool Group to Agents​

Available Tools​

1. Text to Speech (elevenlabs_text_to_speech)​

2. Transcribe (elevenlabs_transcribe)​

3. Sound Effects (elevenlabs_sound_effects)​

4. Generate Music (elevenlabs_generate_music)​

5. Dub (elevenlabs_dub)​

6. Voice Changer (elevenlabs_voice_changer)​

7. Audio Isolation (elevenlabs_audio_isolation)​

Security & Limitations​

Security​

Limitations​

Rate Limits​

Troubleshooting​

API Key Validation Fails​

Connection Test Fails After Connecting​

Agent Cannot Find ElevenLabs Tools​

Text-to-Speech Returns Error​

Dubbing Job Stuck in Processing​

File URL Rejected​

Rate Limit Errors​

Best Practices​

Agent Instructions​

Configuration​

Security​

Related Topics​

Overview

Use Cases

How It Works

Getting Started

Prerequisites

Step 1: Connect ElevenLabs

Step 2: Verify the Connection

Step 3: Assign the Tool Group to Agents

Available Tools

1. Text to Speech (`elevenlabs_text_to_speech`)

2. Transcribe (`elevenlabs_transcribe`)

3. Sound Effects (`elevenlabs_sound_effects`)

4. Generate Music (`elevenlabs_generate_music`)

5. Dub (`elevenlabs_dub`)

6. Voice Changer (`elevenlabs_voice_changer`)

7. Audio Isolation (`elevenlabs_audio_isolation`)

Security & Limitations

Security

Limitations

Rate Limits

Troubleshooting

API Key Validation Fails

Connection Test Fails After Connecting

Agent Cannot Find ElevenLabs Tools

Text-to-Speech Returns Error

Dubbing Job Stuck in Processing

File URL Rejected

Rate Limit Errors

Best Practices

Agent Instructions

Configuration

Security

Related Topics