BuilderBias logo
Keys are stored locally and only sent to their respective APIs.

Echō

AI-powered closed caption translation & natural voice dubbing.

What is Echō?

Echō

Enterprise-grade closed caption translation and natural voice dubbing. Upload media, get perfectly timed translations and AI-cloned voice dubs in minutes.

The Problem

Localization pipelines at studios like Disney, Netflix, and Amazon involve 5+ fragmented tools, weeks of turnaround, and expensive manual linguist work for isometric dubbing — rewriting translations to match original speech timing. Voice casting alone can take days. The result: content launches in 1-2 languages and takes months to reach global audiences.

How Echō Works
01
Upload

Drop in any video or audio file. Echō accepts MP4, MOV, MKV, WAV, MP3, and more.

02
Transcribe & Translate

ElevenLabs Scribe extracts speech with timestamps and speaker identification. Claude translates with cultural adaptation and timing constraints.

03
Dub & Export

ElevenLabs clones the original voice and synthesizes the translation. Export captions or dubbed audio.

The Technology Stack
ElevenLabs Scribe

High-accuracy speech-to-text with word-level timestamps, speaker diarization, and 99+ language support. Powers the transcription pipeline.

Claude (Anthropic)

Context-aware translation that preserves idioms, tone, and cultural nuance. Handles isometric adaptation — rewriting translated text to fit original timing windows.

ElevenLabs

Voice cloning from audio samples. Multilingual synthesis with prosody, emotion, and pacing control that sounds natural.

Isometric Dubbing

The secret sauce. Claude rewrites translated text to match the duration of each original speech segment — the same thing Disney pays linguists to do manually.

What Disney Could Do Better

Studios currently use 5+ separate tools for transcription, translation agencies, voice casting, recording studios, and manual QC. Echō consolidates everything into two APIs — Claude for all intelligence (transcription, translation, adaptation) and ElevenLabs for voice synthesis. Voice cloning eliminates casting and recording for 80% of use cases, isometric adaptation replaces weeks of manual linguist work, and real-time preview eliminates the back-and-forth between translation and audio teams.

Supported Formats
Input

MP4, MOV, MKV, AVI, WebM, MP3, WAV, AAC, FLAC, OGG, M4A, WMA

Caption Export

SRT, VTT (WebVTT), SBV (YouTube), TTML, DFXP, SSA/ASS, JSON

Audio Export

MP3, WAV, AAC — individual segments or full mixed track

Languages

English ↔ Spanish (launch). More languages coming.

Step-by-Step Guide

Follow these steps to go from raw media to fully translated captions and natural-sounding dubbed audio.

0 Configure Your API Keys

Echō connects to two AI services. Enter your keys in the sub-bar at the top of the page. Each key lights up green when configured.

Claude
Powers translation and isometric adaptation — rewriting translations to match original speech timing. Get your key at console.anthropic.com. Uses Claude Sonnet for fast, high-quality results.
ElevenLabs
Powers voice cloning and speech synthesis. Get your key at elevenlabs.io/app/settings/api-keys. Free tier includes limited characters; paid plans unlock more.

Your keys are stored in your browser's localStorage only. They are never sent to BuilderBias servers — each key is sent directly to its respective API (Anthropic or ElevenLabs) over HTTPS.

1 Upload Your Media

Drag and drop a video or audio file onto the upload zone, or click to browse your files. Echō accepts all major formats:

Video: MP4, MOV, MKV, AVI, WebM Audio: MP3, WAV, AAC, FLAC, OGG Captions: SRT, VTT, SBV, TTML

Once uploaded, you'll see the file name, size, and type. Select your source language and target language from the dropdowns. Currently supports English ↔ Spanish with more languages coming.

2 Transcribe with ElevenLabs Scribe

Click "Start Transcription" to send your audio to ElevenLabs Scribe. Scribe will:

  • Extract all spoken words from the audio track with high accuracy
  • Generate precise start and end timestamps for each word and segment
  • Identify different speakers (speaker diarization)
  • Detect the source language automatically if set to "Auto-detect"

When complete, you'll see the full transcript in a table with timecodes, speaker labels, and the original text. A stats bar shows total segments, duration, speakers detected, and word count. Video files have their audio automatically extracted before transcription.

Tip: For best results, use audio with minimal background noise. Scribe handles accents and multiple speakers well, but heavy music or sound effects can reduce accuracy.

3 Translate & Isometric Adaptation

Click "Translate & Adapt" to send all segments to Claude. This is a two-part process:

Translation
Claude translates each segment with cultural context — preserving idioms, humor, tone, and register rather than doing a word-for-word literal translation. A joke stays funny, a formal address stays formal.
Adaptation
Claude then rewrites the translation to fit the original segment's time window. This is isometric dubbing — the same process Disney pays linguists to do manually. If a 3-second English phrase translates to a 5-second Spanish phrase, Claude finds a shorter way to say it that still sounds natural.

The transcript table updates with translations shown in green and adapted versions in amber. The "Fit" column shows whether the adapted text fits the timing window — OK means it fits, a percentage shows how much longer it runs.

4 Generate Voice Dub

Click "Generate Voice Dub" to synthesize the translated audio using ElevenLabs. The process:

  • Voice selection — Echō selects a multilingual voice from your ElevenLabs account (or uses a cloned voice if available)
  • Segment-by-segment synthesis — Each adapted text segment is synthesized individually for precise timing control
  • Prosody matching — ElevenLabs' Multilingual v2 model preserves natural speech patterns, emotion, and pacing
  • Progress tracking — Watch each segment go from "Pending" to "Processing" to "Done" in real-time

This step is optional — if you only need translated captions, click "Skip to Export" to jump ahead.

Tip: For the best voice cloning results, upload a voice sample to ElevenLabs first (elevenlabs.io/voice-cloning). Echō will automatically use your cloned voice for synthesis, making the dubbed audio sound like the original speaker.

5 Preview Your Results

The preview player appears after translation completes. Use it to review your work before exporting:

Original
Play back with original-language captions overlaid on the waveform timeline.
Dubbed
Play back with translated/adapted captions. If voice dubbing is complete, hear the synthesized audio.
Side by Side
See original and translated captions simultaneously — ideal for QC review and comparing translations.

Use the scrubber to jump to any point in the timeline. Captions update in real-time as you scrub through the waveform.

6 Export

Click any export card to download your translated content. Available formats:

SRT
The universal subtitle format. Works with VLC, Premiere Pro, DaVinci Resolve, Final Cut, and virtually every video player and editor.
WebVTT
Web-native format with CSS styling support. Ideal for HTML5 video players, web apps, and streaming platforms.
SBV
YouTube's native caption format. Upload directly to YouTube Studio for instant localized captions.
TTML / DFXP
Broadcast-grade XML format used by Netflix, Disney+, and broadcast networks. Required for many content delivery platforms.
JSON
Full pipeline data including original text, translations, adapted text, timing, and speaker info. Use this for API integrations or custom workflows.
Dubbed Audio
Download the AI-generated dubbed audio track (available after voice dubbing is complete). Ready to mix with original video in your editor.
Pro Tips
Caption-Only Workflow

If you only need translated captions (no voice dubbing), skip step 4 entirely. After translation, go straight to export. Both API keys are still needed — ElevenLabs for transcription and Claude for translation.

🎤
Better Voice Cloning

For the most natural dubbing, create a custom voice clone in ElevenLabs first using a clean sample of the original speaker. Echō will use it automatically.

📊
Timing Fit Indicators

Watch the "Fit" column after translation. Green "OK" means the adapted text fits the timing. A percentage like "+15%" means it runs slightly long — the voice synthesis will speak slightly faster to compensate, but you may want to manually shorten the text for the most natural result.

🔒
Security

Your API keys and media files never touch BuilderBias servers. All API calls go directly from your browser to Anthropic and ElevenLabs over encrypted HTTPS connections. Keys are saved in localStorage so you don't have to re-enter them.

1 Upload
2 Transcribe
3 Translate
4 Dub
5 Export
🎬
Drop your media file here
or click to browse — video, audio, or existing caption files
MP4 MOV MKV AVI WebM MP3 WAV AAC FLAC OGG SRT VTT
🎬
Preparing...
Processing
Transcript & Translation
0
Segments
0:00
Duration
0
Speakers
0
Words
Time Speaker Original Translation Fit Dub
Preview
0:00 / 0:00
Captions will appear here during playback
Export
📄
SRT Captions
Standard subtitle format. Works everywhere.
SRT
🌐
WebVTT Captions
Web-native format with styling support.
VTT
YouTube SBV
YouTube's native caption format.
SBV
🎥
TTML / DFXP
Broadcast-grade XML format for studios.
TTML DFXP
{}
JSON Data
Full pipeline data for API integration.
JSON