Generate and stream TTS audio based on the provided text. Returns an audio stream in the specified format (default MP3). Supported formats: mp3, opus, mulaw. For wav or s16le output, use the WebSocket API.
| Format | Description |
|---|---|
mp3 | Compressed audio, smallest file size. Default. |
opus | High-quality compressed audio, efficient for streaming. |
mulaw | 8-bit µ-law encoding, commonly used in telephony. Defaults to 8000 Hz if no sample rate is specified. |
mulaw defaults to 8000 Hz.
| Feature | REST API | WebSocket API |
|---|---|---|
| Delivery | Streaming HTTP response (chunked audio bytes) | Chunked audio delivered incrementally as base64-encoded JSON messages |
| Formats | mp3, opus, mulaw | wav (default), mp3, opus, mulaw, s16le |
| Streaming input (ctx/isFinal) | Not supported | wav, s16le, mulaw only |
| Default format | mp3 | wav |
| Default mulaw sample rate | 8000 Hz | 8000 Hz |
| Best for | Simple integrations, file generation | Real-time playback, low-latency applications |
API key for authentication. Must start with dd- prefix.
API Key
Request structure for TTS generation endpoints.
Optional parameters (not shown in playground): generationId (string), targetDuration (number, seconds), tempo (number, 0.5–2.0), variance (number, 0.0–1.0), seed (integer), temperature (number, 0.0–1.0), sampleRate (integer), format (string: mp3/opus/mulaw — default mp3), promptBoost (boolean), superStretch (boolean), realtime (boolean), cleanAudio (boolean, default true), autoGain (boolean), publish (boolean), accentControl (object with accentBaseLocale, accentLocale, accentRatio), performanceReferencePromptId (string), voiceReference (string, base64-encoded audio).
Model ID to use for generation
"dd-etts-3.0"
Text to be converted to speech
"Hello world, welcome to Deepdub."
Language locale code (e.g., en-US, fr-FR)
"en-US"
ID of the voice prompt to use for generation
"bd1b00bb-be1c-4679-8eaa-0fcbfd4ff773"
Audio stream in the requested format (MP3, Opus, or mulaw depending on format parameter). The response body is raw audio bytes.