Overview
The WebSocket API enables real-time, chunked audio streaming for low-latency TTS generation. Audio data is delivered incrementally as base64-encoded chunks, allowing playback to begin before the full generation is complete.The WebSocket API uses the same generation parameters as the REST TTS endpoint, but delivers audio as a stream of chunks rather than a single response.
Connection
Connect to the WebSocket endpoint with your API key:x-api-key header or query parameter.
Request format
Send a JSON message on the WebSocket connection:The type of generation request.
Model ID to use for generation (e.g.,
dd-etts-3.0).Text to convert to speech.
Language locale code (e.g.,
en-US, fr-FR).ID of the voice prompt to use. Supports
asset: prefix for built-in voices.Optional client-provided ID. Auto-generated if not provided.
Target audio duration in seconds.
Playback speed multiplier (0.5-2.0).
Voice variation level (0.0-1.0).
Random seed for deterministic generation.
Generation temperature (0.0-1.0).
Output sample rate in Hz. Supported:
8000, 16000, 22050, 24000, 32000, 36000, 44100, 48000.Output audio format:
mp3, wav, opus, or mulaw.Enhance voice prompt characteristics.
Enable super stretch mode for longer audio.
Enable real-time priority processing.
Apply audio cleanup processing.
Automatically adjust audio gain levels.
Accent blending parameters. See AccentControl below.
ID of a performance reference prompt to guide delivery style.
Example request
Response format
Audio chunks
Audio is delivered as a series of JSON messages. Each chunk contains a portion of the audio data:Sequential chunk index starting from 0.
The generation ID for this request. Use this to correlate chunks with requests when running multiple generations on the same connection.
Base64-encoded audio data for this chunk.
true when this is the final chunk of the generation.Example response stream
Initial acknowledgement:Error responses
When an error occurs, the WebSocket sends a JSON error message:Human-readable error description.
Error category. One of:
RateLimit, MaxExceeded, InsufficientCredits, InvalidInput.The generation ID, if available.
| Error type | Description |
|---|---|
RateLimit | Too many concurrent requests. Reduce request frequency. |
MaxExceeded | Maximum generation minutes reached for your plan. |
InsufficientCredits | Account has insufficient credits. Top up your balance. |
InvalidInput | Invalid request parameters. Check your request body. |
Accent control
Blend accents between two locales using theaccentControl object:
| Field | Type | Description |
|---|---|---|
accentBaseLocale | string | Base accent locale (e.g., en-US) |
accentLocale | string | Target accent to blend (e.g., fr-FR) |
accentRatio | number | Blend ratio from 0.0 (base only) to 1.0 (target only) |
