ElevenLabs in English — where the quality sits now
ElevenLabs' Multilingual v2 and Eleven Turbo v2.5 models have tuned variants with natural stress, emotional tone shifts and clean consonant separation. The jump since 2024 is large. Against TTS engines that have been in the market for years, what sets ElevenLabs apart is emotional range and broadcast-like audio quality. Older corporate engines are still solid in IVR systems but stay robotic in their intonation, and basic system voices never reached voiceover quality at all.
Where does it actually deliver? Commercial VO (especially short formats under 30 seconds), podcast intros and outros, e-learning modules, IVR greetings and mobile app notifications — across all of these it reaches roughly 80-90% of human voiceover quality. Client feedback points the same way: on short ad spots there are cases where nobody notices the difference.
Where you still need a human voice actor: emotional creative monologue, multi-character ad dialogue, literary narration, child voices, regional accent work and national broadcast TVCs.
Voice cloning — building a brand voice
ElevenLabs has two cloning methods, and the gap between them is clear. Instant Voice Cloning (IVC) produces a usable AI voice from just one minute of clean audio. It's fast, but the output flattens a little — the source voice's finer expressions don't fully carry over. Professional Voice Cloning (PVC) needs a minimum of 30 minutes of recording, ideally around 60. At that level the output is almost indistinguishable from the source.
How should the reference recording be made? A quiet room, a fixed mic distance (20-25 cm), a mix of low and high register sentences, read at a natural speaking pace. Recordings made in studio conditions clone far better than ones captured on a phone mic. Keep noise, clicks and breath sounds to a minimum.
Enterprise scenario: a brand ambassador or CEO records 30-60 minutes in studio for PVC. The AI voice is built. From then on a single, consistent voice runs across all brand communication — ad spots, training videos, social content — and updates can be made without booking the voice actor again. For brands with frequent product launches, the time and budget saved adds up fast.
Practical use — producing commercial VO
How do you produce a commercial voiceover end to end with ElevenLabs? The workflow runs like this:
1. Brief and script: polish the script with ChatGPT or Claude — short sentences, clear pauses, copy that reads naturally aloud.
2. Voice pick: choose a fitting voice from the ElevenLabs Voice Library, or test the stock library voices. Warm narrative tones usually work better for ads.
3. Generation settings: Stability 35-50 (more natural, less predictable) · Similarity 75-85 (fidelity to the clone) · Style 0-30 (emotion).
4. SSML and style tags: add dramatic emphasis with [pause 0.5s], [whisper], [laugh]; drop pause commands between sentences to control tempo and pacing.
5. Integration tools: the ElevenLabs API connects straight to the edit bay through the Premiere Pro plugin. In CapCut you import the generated audio as an external file, and in Runway you can sync the AI voice to a character with lip sync.
6. Post production: run the output through a compressor and EQ chain in a DAW (Logic Pro, Pro Tools, Reaper) and bring it to broadcast standard, i.e. -23 LUFS.
Tips for clean pronunciation
TTS engines are sensitive to phrasing, and a few habits keep the output clean across languages.
Script writing: short sentences are the golden rule for AI voiceover. Long subordinate clauses break the intonation. Write the script the way people speak, with punctuation in the right places, and keep each sentence to a maximum of 15-18 words.
Emphasis marks: mark a word you want stressed within a sentence with *word* — for example, "This is *the best* pick of the season."
Numbers and currency: always spell figures out — "one hundred fifty dollars" rather than "$150", "twenty twenty-six" rather than "2026". The engine reads raw digits inconsistently.
Tempo and emotion: control pacing with SSML tags such as <prosody rate="slow"> or <prosody rate="fast">. For emotional shifts — moving from a serious line into an upbeat CTA, say — it works better to generate two separate takes and join them in the DAW.
Pricing 2026
The ElevenLabs plan structure in 2026:
- Free: 10,000 characters/month (~10 minutes of audio), mandatory ElevenLabs credit on generated content, no commercial use
- Starter ($5/mo): 30,000 characters/month, Instant Voice Cloning, commercial rights, 1 custom voice slot
- Creator ($22/mo): 100,000 characters/month, Professional Voice Cloning, higher audio quality, 10 custom voice slots
- Pro ($99/mo): 500,000 characters/month, 192 kbps audio, 30 custom voice slots, priority generation queue
- Business ($330/mo): 2,000,000 characters/month, team management, high API limits, bulk content production for a brand
- Enterprise: custom pricing, white-label, SLA guarantees, custom model training, GDPR-compliant data handling
For a solo creative or small agency, the Creator plan (100,000 characters a month is roughly 100 minutes of audio) is usually enough. For multi-brand agencies or frequent producers, Pro is the best balance point.
Legal and ethical framework
AI voice cloning isn't just technical — it's a serious legal area. The essentials:
1. Consent is mandatory: cloning a real person's voice requires written, explicit consent. ElevenLabs' terms of use ban unauthorised cloning outright, and a breach ends in account closure. Platform-level controls aren't enough on their own; in most jurisdictions a voice is protected under personality rights.
2. Personality rights: a person whose voice is used without permission can claim damages. The risk is especially high for the voices of public figures, and advertising regulators have begun reviewing complaints of this kind.
3. Data protection (GDPR/local law): voice data counts as biometric data, so it falls under data protection law. If you process an employee's or customer's voice, disclosure obligations apply. ElevenLabs' Enterprise plan offers GDPR-compliant data handling.
4. Disclosure: under the EU's AI Act, disclosing AI-generated voices in ad material is becoming mandatory. It's a sensible safeguard for any brand following international broadcast standards.
5. Brand risk: if an unauthorised voice clone makes the news, the reputational damage can far exceed the production cost you saved. Staying inside the legal framework is the right call ethically and commercially.
AI voice in the PAM AI Studio workflow
How do we position AI voice in our commercial productions? There are three clear scenarios, and we do something different in each.
Demo and animatic: we use an ElevenLabs voice to test a scene before it goes to client approval. Testing the rhythm of a scene and the length of the VO with a placeholder voice is genuinely useful — and when a client asks "are the voices going to be this good?", it's easy to answer.
Multilingual localization: when we adapt the same ad to several languages or markets, ElevenLabs saves us real time. English, German and Arabic versions from a single master, without booking a separate human voice session each time. If the client still wants a tweak after approval, it's quick to fix.
Social and short content: for short spots made for TikTok, Reels and YouTube Shorts, AI voice is now part of our default workflow. Speed is critical in a weekly content cycle, and booking a voiceover studio for a 30-second video gets expensive fast.
Final TVC and large campaigns: here we still bring in a human voice actor. For emotional depth, brand-specific nuance and broadcast-quality standards, that choice doesn't change. AI voice is a tool, not a replacement.
To produce an ad with AI voiceover, reach the studio here.