Generate and stream TTS audio

curl --request POST \
  --url https://restapi.deepdub.ai/api/v1/tts \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "model": "dd-etts-3.0",
  "targetText": "Hello world, welcome to Deepdub.",
  "locale": "en-US",
  "voicePromptId": "bd1b00bb-be1c-4679-8eaa-0fcbfd4ff773"
}
'

{
  "success": false,
  "message": "Invalid request: missing required field 'targetText'"
}

POST

tts

curl --request POST \
  --url https://restapi.deepdub.ai/api/v1/tts \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "model": "dd-etts-3.0",
  "targetText": "Hello world, welcome to Deepdub.",
  "locale": "en-US",
  "voicePromptId": "bd1b00bb-be1c-4679-8eaa-0fcbfd4ff773"
}
'

{
  "success": false,
  "message": "Invalid request: missing required field 'targetText'"
}

Supported languages

Language	Locale code
Arabic (Lebanon)	`ar-LB`
Arabic (Qatar)	`ar-QA`
Arabic (Saudi)	`ar-SA`
Arabic (Standard)	`ar-SA`
Arabic (Syrian)	`ar-SY`
Czech (Standard)	`cs-CZ`
Danish (Standard)	`da-DK`
Dutch (Netherlands)	`nl-NL`
English (Generic)	`en-GB`
English (Standard)	`en-AU`
English (United States)	`en-US`
Estonian (Standard)	`et-EE`
Finnish (Standard)	`fi-FI`
French (Standard)	`fr-FR`
German (Standard)	`de-DE`
Greek (Standard)	`el-GR`
Hebrew (Standard)	`he-IL`
Hindi (Standard)	`hi-IN`
Hungarian (Standard)	`hu-HU`
Indonesian (Standard)	`id-ID`
Italian (Standard)	`it-IT`
Japanese (Standard)	`ja-JP`
Korean (Standard)	`ko-KR`
Macedonian (Standard)	`mk-MK`
Norwegian (Standard)	`nb-NO`
Polish (Standard)	`pl-PL`
Portuguese (Brazil)	`pt-BR`
Romanian (Standard)	`ro-RO`
Russian (Standard)	`ru-RU`
Spanish (Latam)	`es-419`
Spanish (Latam — Mexico)	`es-MX`
Spanish (Standard)	`es-ES`
Swedish (Standard)	`sv-SE`
Tamil (Standard)	`ta-IN`
Thai (Standard)	`th-TH`
Turkish (Standard)	`tr-TR`

Supported output formats

The REST API streams audio as raw bytes in the HTTP response body. Supported formats:

Format	Description
`mp3`	Compressed audio, smallest file size. Default.
`opus`	High-quality compressed audio, efficient for streaming.
`mulaw`	8-bit µ-law encoding, commonly used in telephony. Defaults to 8000 Hz if no sample rate is specified.

The REST API supports mp3, opus, and mulaw only. For wav or s16le output, use the WebSocket API.

Sample rates

The sample rate is passed through to the audio conversion layer. The internal generation runs at 48 kHz and is resampled to the requested rate. If no sample rate is specified, mulaw defaults to 8000 Hz.

REST vs WebSocket comparison

Feature	REST API	WebSocket API
Delivery	Streaming HTTP response (chunked audio bytes)	Chunked audio delivered incrementally as base64-encoded JSON messages
Formats	`mp3`, `opus`, `mulaw`	`wav` (default), `mp3`, `opus`, `mulaw`, `s16le`
Streaming input (ctx/isFinal)	Not supported	`wav`, `s16le`, `mulaw` only
Default format	`mp3`	`wav`
Default mulaw sample rate	8000 Hz	8000 Hz
Best for	Simple integrations, file generation	Real-time playback, low-latency applications

Authorizations

x-api-key

string

header

required

API key for authentication. Must start with dd- prefix.

Headers

x-api-key

string

default:dd-00000000000000000000000065c9cbfe

required

API Key

Body

application/json

Request structure for TTS generation endpoints.

Optional parameters (not shown in playground): generationId (string), targetDuration (number, seconds), tempo (number, 0.5–2.0), variance (number, 0.0–1.0), seed (integer), temperature (number, 0.0–1.0), sampleRate (integer), format (string: mp3/opus/mulaw — default mp3), promptBoost (boolean), superStretch (boolean), realtime (boolean), cleanAudio (boolean, default true), autoGain (boolean), publish (boolean), accentControl (object with accentBaseLocale, accentLocale, accentRatio), performanceReferencePromptId (string), voiceReference (string, base64-encoded audio).

model

string

default:dd-etts-3.0

required

Model ID to use for generation

Example:

"dd-etts-3.0"

targetText

string

required

Text to be converted to speech

Example:

"Hello world, welcome to Deepdub."

locale

string

required

Language locale code (e.g., en-US, fr-FR)

Example:

"en-US"

voicePromptId

string

required

ID of the voice prompt to use for generation

Example:

"bd1b00bb-be1c-4679-8eaa-0fcbfd4ff773"

Response

Audio stream in the requested format (MP3, Opus, or mulaw depending on format parameter). The response body is raw audio bytes.

Classify speaker gender

Documentation Index

​Supported languages

​Supported output formats

​Sample rates

​REST vs WebSocket comparison

Authorizations

Headers

Body

Response

Supported languages

Supported output formats

Sample rates

REST vs WebSocket comparison