ElevenLabs – A Review Of The Leading AI Voice Platform
ElevenLabs was founded in 2022 by Piotr Dąbkowski & Mati Staniszewski, former employees of Google and Palantir. It is an AI speech synthesis programme.
Over the past three years it has grown in functionality and scope to become the leading AI speech synthesis product on the market. But that description is somewhat underselling it. ElevenLabs can do a lot more than just generate speech!
Over a million ElevenLabs users are already using the software to:
- Provide clear, crisp voice recordings
- Create AI “clones” of their own voice.
- Answer customer enquiries using “AI Agents”.
- Automatically translate and dub their video content into multiple languages.
- Create audiobooks and podcasts.
Let’s look at how it works and whether we’d recommend it.
Text-to-Speech
ElevenLabs’s Text-to-Speech (TTS) feature allows you to convert text into natural-sounding audio using advanced AI voice models.
- Text Input Area:
Type or paste the text you want to turn into speech into this area. - Voice Selection:
You can choose between a variety of AI-generated voices tailored for different tones, genders, and styles. Each of the voices has a series of descriptor tags to let you know what sort of voice you are going to get. - Model Selection:
The “Eleven Multilingual v2” model supports multiple languages and produces consistent high-quality results. There are other models to choose from which may use fewer resources or be better suited to certain tasks or languages. Certain models are recommended for use with certain languages. - Stability and Similarity Sliders:
These sliders adjust the stability and similarity of the speech output, allowing you to fine-tune the expressiveness of the voice model. If you put the sliders to full similarity and stability, you might find that the voice becomes a bit robotic and monotone, so I’ve found that it works best if you have both sliders at around 75% to keep the voice consistent while still lifelike. - History Tab:
You can listen to your previous outputs and see the settings that produced them. Previous outputs for convenience.
VERDICT:
The text-to-speech feature of ElevenLabs works incredibly well. It takes only a few seconds to create broadcast-quality voice recordings which you could then use in various contexts. For example, you could write out all the options for your business’s automated phone system and export them as MP3s.
Voice Changer
The Voice Changer tool enables you to transform a recording of a voice into one from the AI voice models. For example, you could record yourself speaking into your phone and the voice changer would turn this low-quality audio into a high-quality recording of AI-generated voice, matching the tone and cadence of your original recording with a great degree of accuracy.
VERDICT
This is a great option if you want to add recorded speech to your projects but lack good-quality recording equipment or a suitable space to record audio in.
I also found that the tool performs best when the accent of the selected model matches the original audio’s accent, ensuring more natural results. I have a British accent and when I tried it with the American AI voice models it sounded VERY weird. Once I switched to the “Chris” model which also has a British accent, it sounded much more natural. This could cause issues if you wish to use an American accent for your project but personally speak in a different language. In these cases, it may be better to type out your script by hand and use the Text-To-Speech feature instead.
Sound Effects Generator
Using the sound effect generator, you can create foley sound effects using text descriptions. Each prompt will generate four different options that you can download.
VERDICT:
Compared to the text-to-speech features of ElevenLabs, the sound effects generator is very basic and not as reliable. While simple sound effects can sound passable, anything more complex tends to be less impressive. It struggles with generating layered sounds like “a car crash on a busy road” and performs better with single effects hits like “a glass breaking” or “a cat’s meow”. In comparison to professional (or even free) Foley sound libraries, the sounds don’t stack up that well.
However, most aspects of ElevenLabs have been improving over time as their AI models get updated, so I have no doubt this aspect of the software will also get better in the coming months.
Dubbing Studio
The dubbing studio combines the AI voice models with AI translation to dub videos into multiple languages.
You can upload a video or copy and paste the URL of a YouTube or TikTok video you want dubbed and ElevenLabs will translate the speech and generate a dubbed video. There is a wide variety of both input and output languages to choose from (with more likely to be added soon). You can choose which voice models to use for the outputs and the system will do its best to match the lip syncing with the original video.
VERDICT:
This is a powerful tool which will allow individuals and businesses to distribute their content to a much wider audience at a very low cost. Whereas previously you would have to pay a native speaker to translate and record new audio, it can now be generated in a matter of minutes.
I have found that if there are more than two speakers it can cause some missed words or dropped audio, especially if the original audio is not clean or there is a lot of crosstalk. So two people having a calm conversation is more likely to produce accurate results than four people arguing while in a busy restaurant.
There is also a risk to using software like this if you do not personally speak the output language as there is no way to easily check that the output video is accurate. For example, I dubbed a video of myself into Hindi, but I do not speak Hindi myself, so I have no idea if what was being said in the output video was accurate or not. For all I know, it was – but this uncertainty highlights the problem of relying on AI tools without human oversight.
Voice Cloning
Voice cloning works by using a short recording of an existing voice to generate an AI model that you can use in other parts of ElevenLabs software like the Text-To-Speech tool. From just a few minutes of audio, it can create a relatively convincing voice model.
VERDICT:
The clone of my voice worked relatively well, but as with some of the other features of ElevenLabs, the underlying speech model seemed to have been trained only on American voices, so while the timbre of the voice matched mine, the cadence and inflexion were very Americanised. It would pronounce words like “niche” the American way. I have to say, it was quite an odd experience listening back to it – like a parallel universe American doppelganger of myself. However, I have looked online at examples with American accents, and they seem to be scarily good. I hope ElevenLabs will update its models soon to account for differences in accents.
Projects
Projects are more in-depth creations which you can generate using one or more AI voices. This includes turning text into audiobooks and podcasts. You can upload documents directly to ElevenLabs or use the URL of an existing piece of writing online.
VERDICT:
This is particularly useful in an age where many people use audio as their prime source of entertainment. I don’t tend to read newspapers or online articles very often these days, but when I’m walking through town, I almost always have headphones on listening to podcasts, interviews, reviews etc. Creating audio versions of your existing text content allows you to target a much wider audience than just publishing text.
Conversation AI
The conversation AI allows you to create semi-autonomous AI agents which can hold conversations with you or with your customers. This could be for fun, or for something more useful like creating a call centre agent that can man your phones while you are out of the office. By combining this with the voice clone feature, you could even create a clone of yourself to fill in for you in situations where you are unable to reach the phone.
VERDICT:
As with all AI agents, the ElevenLabs one is only as good as the information you provide. The more information you provide about yourself/your company and its goals, the better it will be. It is unlikely to be able to grasp the intricacies of your company’s services to be able to replace a skilled human in a support role but for simple tasks like routing calls, answering fact-based questions like opening times, and triaging calls for urgency, it can be extremely useful.
Final Verdict on ElevenLabs
ElevenLabs is a very impressive platform with a wide variety of tools. The text-to-speech, voice cloning, dubbing studio and conversational AI currently stand out as the most useful and fully developed features. Others, like the Sound Effects Generator and voice model adjustments for accents, still have room for improvement but could be equally powerful with a few updates.
There is a huge amount of value to be gained for content producers, businesses and creatives. A tool like the dubbing studio could save you thousands and allow you to scale your video content distribution to foreign language markets in a matter of minutes – something that would have taken days of manhours in a pre-AI world.
As the technology evolves, ElevenLabs is poised to become a go-to tool for anyone working with audio content. Like all areas of AI, voice synthesis is improving at an exponential rate, so it may even have been updated by the time you’re reading this.