Skip to main content

Installation

pip install deepdub
Requirements: Python 3.9+ Dependencies: requests, websockets, click, audiosample

Initialization

from deepdub import DeepdubClient

# Option 1: Pass API key directly
client = DeepdubClient(api_key="dd-your-api-key")

# Option 2: Use DEEPDUB_API_KEY environment variable
# export DEEPDUB_API_KEY=dd-your-api-key
client = DeepdubClient()

Constructor parameters

api_key
string
Your Deepdub API key. Falls back to DEEPDUB_API_KEY environment variable if not provided.
base_url
string
default:"https://restapi.deepdub.ai/api/v1"
Base URL for the REST API. Falls back to DEEPDUB_BASE_URL environment variable.
base_websocket_url
string
default:"wss://wsapi.deepdub.ai/open"
Base URL for the WebSocket API. Falls back to DEEPDUB_BASE_WEBSOCKET_URL environment variable.
base_websocket_streaming_url
string
default:"wss://wss.deepdub.ai/ws"
Base URL for the WebSocket streaming API. Falls back to DEEPDUB_BASE_WEBSOCKET_STREAMING_URL environment variable.
eu
boolean
default:"false"
Use EU region endpoints (restapi.eu.deepdub.ai, wsapi.eu.deepdub.ai). Falls back to DD_EU environment variable ("1" to enable).

Region endpoints

RegionREST APIWebSocket API
US (default)https://restapi.deepdub.ai/api/v1wss://wsapi.deepdub.ai/open
EUhttps://restapi.eu.deepdub.ai/api/v1wss://wsapi.eu.deepdub.ai/open

Text-to-Speech

tts() — Synchronous generation

Generate speech and receive the complete audio as bytes.
audio_data = client.tts(
    text="Hello, welcome to Deepdub!",
    voice_prompt_id="your-voice-id",
    model="dd-etts-2.5",
    locale="en-US"
)

with open("output.mp3", "wb") as f:
    f.write(audio_data)
Returns: bytes — binary audio data in the specified format.

Parameters

text
string
required
Text to convert to speech.
voice_prompt_id
string
Voice prompt ID to use. Either this or voice_reference must be provided.
voice_reference
Union[bytes, str, Path]
Audio reference for instant voice cloning. Accepts a file Path, raw bytes, or a base64-encoded string. Either this or voice_prompt_id must be provided.
model
string
default:"dd-etts-2.5"
Model ID. Available models: dd-etts-3.0, dd-etts-2.5.
locale
string
default:"en-US"
Language locale code (e.g., en-US, fr-FR).
format
string
default:"mp3"
Audio output format: mp3, headerless-wav, opus, or mulaw.
temperature
float
Generation temperature (0.0–1.0). Higher values produce more varied output.
variance
float
Voice variation level (0.0–1.0).
duration
float
Target audio duration in seconds. Mutually exclusive with tempo.
tempo
float
Playback speed multiplier. Mutually exclusive with duration.
seed
int
Random seed for deterministic generation.
prompt_boost
bool
Enhance voice prompt characteristics.
sample_rate
int
Output sample rate in Hz. Supported: 8000, 16000, 22050, 24000, 44100, 48000.
accent_base_locale
string
Base accent locale (e.g., en-US). Must be provided together with accent_locale and accent_ratio.
accent_locale
string
Target accent locale (e.g., fr-FR). Must be provided together with accent_base_locale and accent_ratio.
accent_ratio
float
Accent blend ratio (0.0–1.0). Must be provided together with accent_base_locale and accent_locale.

Full example with all parameters

audio_data = client.tts(
    text="This demonstrates all available TTS parameters.",
    voice_prompt_id="your-voice-id",
    model="dd-etts-2.5",
    locale="en-US",
    format="mp3",
    temperature=0.7,
    variance=0.6,
    tempo=1.1,
    seed=42,
    prompt_boost=True,
    sample_rate=44100,
    accent_base_locale="en-US",
    accent_locale="fr-FR",
    accent_ratio=0.3,
)

with open("output.mp3", "wb") as f:
    f.write(audio_data)

Voice cloning from audio reference

from pathlib import Path

audio_data = client.tts(
    text="Cloning a voice from an audio sample.",
    voice_reference=Path("reference_audio.mp3"),
    model="dd-etts-2.5",
    locale="en-US",
)

with open("cloned_output.mp3", "wb") as f:
    f.write(audio_data)

tts_retro() — Retroactive generation

Submit a TTS request and receive a URL for later retrieval.
response = client.tts_retro(
    text="Generate this audio for later retrieval.",
    voice_prompt_id="your-voice-id",
    model="dd-etts-2.5",
    locale="en-US"
)

audio_url = response["url"]
print(f"Audio available at: {audio_url}")
Returns: dict with a url key pointing to the generated audio.

Parameters

text
string
required
Text to convert to speech.
voice_prompt_id
string
required
Voice prompt ID to use.
model
string
default:"dd-etts-2.5"
Model ID.
locale
string
default:"en-US"
Language locale code.

Async / WebSocket TTS

async_tts() — Streaming generation

Stream audio chunks over WebSocket for low-latency playback. Must be used within an async_connect() context.
import asyncio
from deepdub import DeepdubClient

client = DeepdubClient(api_key="dd-your-api-key")

async def stream_audio():
    async with client.async_connect() as conn:
        audio = bytearray()
        async for chunk in conn.async_tts(
            text="Streaming audio in real time!",
            voice_prompt_id="your-voice-id",
            model="dd-etts-2.5",
            format="mp3",
        ):
            audio.extend(chunk)

        with open("streamed.mp3", "wb") as f:
            f.write(audio)

asyncio.run(stream_audio())
Yields: bytes — audio chunks as they are generated.

Parameters

Same as tts(), plus:
generation_id
string
Optional UUID for request tracking. Auto-generated if not provided.
target_gender
string
Target gender for the output voice.
verbose
bool
default:"false"
Print debug information about sent/received messages.

Multiple concurrent generations

The WebSocket connection supports multiplexing — run multiple TTS requests on the same connection:
import asyncio
from deepdub import DeepdubClient

client = DeepdubClient(api_key="dd-your-api-key")

async def generate_multiple():
    async with client.async_connect() as conn:
        async def generate_one(text, filename):
            audio = bytearray()
            async for chunk in conn.async_tts(
                text=text,
                voice_prompt_id="your-voice-id",
                model="dd-etts-2.5",
                format="mp3",
            ):
                audio.extend(chunk)
            with open(filename, "wb") as f:
                f.write(audio)

        await asyncio.gather(
            generate_one("First sentence.", "out1.mp3"),
            generate_one("Second sentence.", "out2.mp3"),
            generate_one("Third sentence.", "out3.mp3"),
        )

asyncio.run(generate_multiple())

Streaming Input

For real-time text streaming (sending text incrementally), use async_stream_connect():
import asyncio
from deepdub import DeepdubClient

client = DeepdubClient(api_key="dd-your-api-key")

async def streaming_input():
    async with client.async_stream_connect(
        model="dd-etts-2.5",
        locale="en-US",
        voice_prompt_id="your-voice-id",
        format="wav",
        sample_rate=16000,
    ) as conn:
        await conn.async_stream_text("Hello, ")
        await conn.async_stream_text("this is streamed ")
        await conn.async_stream_text("text input.")

        while True:
            audio = await conn.async_stream_recv_audio()
            if audio is None:
                break
            # Process audio chunk...

asyncio.run(streaming_input())

Gender Classification

Classify the gender of a speaker from an audio sample:
import asyncio
from pathlib import Path
from deepdub import DeepdubClient

client = DeepdubClient(api_key="dd-your-api-key")

async def classify():
    async with client.async_connect() as conn:
        result = await conn.gender_classify(
            audio_data=Path("speaker_sample.wav"),
            sample_rate=16000,
            timeout=5.0,
        )
        print(result)

asyncio.run(classify())
audio_data
Union[bytes, str, Path]
required
Audio data as raw bytes, base64-encoded string, or file Path. Automatically trimmed to 1 second.
sample_rate
int
default:"16000"
Sample rate of the input audio.
timeout
float
default:"5.0"
Timeout in seconds for the WebSocket response.
generation_id
string
Optional UUID for request tracking.

Voice Management

list_voices() — List all voice prompts

voices = client.list_voices()

for voice in voices.get("voicePrompts", []):
    print(f"{voice['id']}: {voice.get('name', voice.get('title', 'Untitled'))}")
Returns: dict with a voicePrompts key containing a list of voice prompt objects.

add_voice() — Upload a voice sample

from pathlib import Path

response = client.add_voice(
    data=Path("voice_sample.wav"),
    name="Professional Narrator",
    gender="female",
    locale="en-US",
    publish=False,
    speaking_style="Neutral",
    age=30,
)

print(f"Created voice: {response}")
Returns: dict with the created voice prompt information.

Parameters

data
Union[bytes, str, Path]
required
Audio data — a file Path, raw bytes, or base64-encoded string.
name
string
required
Display name for the voice prompt.
gender
string
required
Speaker gender: "male" or "female".
locale
string
required
Language locale code (e.g., en-US).
publish
bool
default:"false"
Whether to make the voice publicly available.
speaking_style
string
default:"Neutral"
Speaking style descriptor.
age
int
default:"0"
Age of the speaker.

CLI Reference

The SDK includes a command-line interface:
# List available voices
deepdub list-voices

# Upload a new voice
deepdub add-voice \
  --file path/to/audio.mp3 \
  --name "My Voice" \
  --gender male \
  --locale en-US

# Generate text-to-speech
deepdub tts \
  --text "Hello from the CLI!" \
  --voice-prompt-id your-voice-id

# Set API key via flag or environment
deepdub --api-key dd-your-key tts --text "Hello!"
export DEEPDUB_API_KEY=dd-your-key

Environment Variables

VariableDescriptionDefault
DEEPDUB_API_KEYAPI key for authentication
DEEPDUB_BASE_URLREST API base URLhttps://restapi.deepdub.ai/api/v1
DEEPDUB_BASE_WEBSOCKET_URLWebSocket API base URLwss://wsapi.deepdub.ai/open
DEEPDUB_BASE_WEBSOCKET_STREAMING_URLStreaming WebSocket base URLwss://wss.deepdub.ai/ws
DD_EUUse EU endpoints ("1" to enable)"0"

Error Handling

from deepdub import DeepdubClient
import requests

client = DeepdubClient(api_key="dd-your-api-key")

try:
    audio = client.tts(
        text="Hello!",
        voice_prompt_id="your-voice-id",
    )
except requests.exceptions.HTTPError as e:
    if e.response.status_code == 401:
        print("Invalid API key")
    elif e.response.status_code == 400:
        print("Invalid request parameters")
    else:
        print(f"API error: {e}")
except ValueError as e:
    print(f"Validation error: {e}")
For async operations, WebSocket errors are raised as Exception with the error message from the server:
try:
    async with client.async_connect() as conn:
        async for chunk in conn.async_tts(text="Hello!", voice_prompt_id="id"):
            pass
except Exception as e:
    error_msg = str(e)
    # Possible errors: "Rate limit exceeded", "Insufficient credits", etc.
    print(f"WebSocket error: {error_msg}")

Available Models

Model IDDescription
dd-etts-3.0Latest model with best quality
dd-etts-2.5Stable production model (default)