Generate and stream TTS audio based on the provided text. Returns an audio stream in the specified format (default MP3). The audio is generated using the specified model and voice prompt, with optional parameters for fine-tuning the output.
API key for authentication. Must start with dd- prefix.
API Key
Request structure for TTS generation endpoints.
Optional parameters (not shown in playground): generationId (string), targetDuration (number, seconds), tempo (number, 0.5–2.0), variance (number, 0.0–1.0), seed (integer), temperature (number, 0.0–1.0), sampleRate (integer: 8000/16000/22050/24000/32000/36000/44100/48000), format (string: mp3/opus/mulaw/wav), promptBoost (boolean), superStretch (boolean), realtime (boolean), cleanAudio (boolean, default true), autoGain (boolean), publish (boolean), accentControl (object with accentBaseLocale, accentLocale, accentRatio), performanceReferencePromptId (string), voiceReference (string, base64-encoded audio).
Model ID to use for generation
"dd-etts-3.0"
Text to be converted to speech
"Hello world, welcome to Deepdub."
Language locale code (e.g., en-US, fr-FR)
"en-US"
ID of the voice prompt to use for generation
"bd1b00bb-be1c-4679-8eaa-0fcbfd4ff773"
Audio stream in the requested format (MP3, WAV, Opus, or mulaw depending on format parameter). The response body is raw audio bytes.