CONTINUE WITH GOOGLE

Trusted by top companies around the globe

TIME

Unlock the Power of Speech to Text

Transcribe audio in 99 languages with world-leading accuracy, speaker separation, timestamps & event tagging—delivered via a simple API.

Industry-leading Accuracy

Lowest error-rate for flawless transcripts.

Smart Speaker Separation

Auto-label speakers for clear, organized text.

Dynamic Audio Tagging

Label sounds—laughter, applause & more.

Convert Speech to Text Instantly

Upload or record audio and get flawless transcripts in seconds—no setup or plugins required.

Pricing

Plans built for creators and business of all sizes

Pricing

Plans built for creators and business of all sizes

Free

For individuals who want the most advanced AI audio

10k credits/month

10k credits/month

$0
per month
Text to Speech
Speech to Text
Conversational AI
Studio
Automated Dubbing
API Access

Credits usable for either:

10 minutes of high-quality Text to Speech
15 minutes of Conversational AI
Conversational AI
Studio
Automated Dubbing
API Access
Starter

For hobbyists creating projects with AI audio

30k credits/month

30k credits/month

$5
per month

Everything in free, plus

Commercial license
Instant Voice Cloning
20 projects in Studio
Dubbing Studio
Pro

For creators making premium content for global audiences

500k credits/month

500k credits/month

$99
per month

Everything in Creator, plus

500mins of Text to Speech
1,100mins of Conversational AI
ENTERPRISE

Scaled production, custom terms, discounts, and deadicated support.

End-to-end Encryption
GDPR, HIPPA & SOC II
Unlimited Usage & Seats

Frequently asked questions

What languages does Scribe support?

Excellent Accuracy (≤ 5% Word Error Rate - WER)
Bulgarian, Catalan, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Hindi, Indonesian, Italian, Japanese, Kannada, Malay, Malayalam, Macedonian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swedish, Turkish, Ukrainian, Vietnamese

High Accuracy (>5% to ≤10% WER)
Bengali, Belarusian, Bosnian, Cantonese, Estonian, Filipino, Gujarati, Hungarian, Kazakh, Latvian, Lithuanian, Mandarin, Marathi, Nepali, Odia, Persian, Slovenian, Tamil, Telugu

Good (>10% to ≤25% WER)
Afrikaans, Arabic, Armenian, Assamese, Asturian, Azerbaijani, Burmese, Cebuano, Croatian, Georgian, Hausa, Hebrew, Icelandic, Javanese, Kabuverdianu, Korean, Kyrgyz, Lingala, Maltese, Mongolian, Māori, Occitan, Punjabi, Sindhi, Swahili, Tajik, Thai, Urdu, Uzbek, Welsh

Moderate (>25% to ≤50% WER)
Amharic, Chichewa, Fulah, Ganda, Igbo, Irish, Khmer, Kurdish, Lao, Luxembourgish, Luo, Northern Sotho, Pashto, Shona, Somali, Umbundu, Wolof, Xhosa, Zulu

What is speech-to-text and how does it work?

Speech-to-text (STT) is a technology that converts spoken language into written text using automatic speech recognition (ASR). It processes audio signals, identifies speech patterns, and transcribes them into text with high accuracy.

ElevenLabs' AI-powered speech-to-text software is designed to transcribe audio and video content with human-like precision, making it ideal for voice-to-text conversion, audio transcription, and real-time speech recognition.

Speech-to-text technology is used in:
✔ Audio-to-text transcription for podcasts, meetings, and interviews.
✔ Captions and subtitles in video content.
✔ Voice-to-text software for hands-free typing and accessibility tools.

ElevenLabs ASR offers fast, reliable, and highly accurate speech-to-text conversion for multiple languages and accents.

How do I transcribe video to text?

ElevenLabs provides video transcription to convert spoken dialogue into text format, making it easy to create subtitles, captions, and searchable transcripts.

Steps to transcribe video to text:
1. Upload your video file to ElevenLabs ASR
2. Speech recognition technology processes the audio
3. A transcript is generated automatically, with timestamps
4. Download the text file or export subtitles for editing.

This AI-powered video transcription model helps content creators, businesses, and educators quickly convert video speech into accurate text for accessibility and content repurposing.

Does ElevenLabs support real-time speech-to-text conversion?

Scribe currently works well for use-cases where the input audio is available upfront. A low-latency, real-time version will be released soon.

How much does Scribe cost?

Starting from $0.40 per hour of transcribed audio, falling well below this at scale with Enterprise plans.

How much does the voice changer cost? Is there a free trial?

Our voice changer is accessible with a generous free plan. Paid plans, offering full access to all the features and more characters, start from a competitive price point.