Python SDK - Deepdub Documentation

Installation

pip install deepdub

Requirements: Python 3.9+ Dependencies: requests, websockets, click, audiosample

Initialization

from deepdub import DeepdubClient

# Option 1: Pass API key directly
client = DeepdubClient(api_key="dd-your-api-key")

# Option 2: Use DEEPDUB_API_KEY environment variable
# export DEEPDUB_API_KEY=dd-your-api-key
client = DeepdubClient()

Constructor parameters

api_key

string

Your Deepdub API key. Falls back to DEEPDUB_API_KEY environment variable if not provided.

base_url

string

default:"https://restapi.deepdub.ai/api/v1"

Base URL for the REST API. Falls back to DEEPDUB_BASE_URL environment variable.

base_websocket_url

string

default:"wss://wsapi.deepdub.ai/open"

Base URL for the WebSocket API. Falls back to DEEPDUB_BASE_WEBSOCKET_URL environment variable.

base_websocket_streaming_url

string

default:"wss://wss.deepdub.ai/ws"

Base URL for the WebSocket streaming API. Falls back to DEEPDUB_BASE_WEBSOCKET_STREAMING_URL environment variable.

boolean

default:"false"

Use EU region endpoints (restapi.eu.deepdub.ai, wsapi.eu.deepdub.ai). Falls back to DD_EU environment variable ("1" to enable).

Region endpoints

Region	REST API	WebSocket API
US (default)	`https://restapi.deepdub.ai/api/v1`	`wss://wsapi.deepdub.ai/open`
EU	`https://restapi.eu.deepdub.ai/api/v1`	`wss://wsapi.eu.deepdub.ai/open`

Text-to-Speech

`tts()` — Synchronous generation

Generate speech and receive the complete audio as bytes.

audio_data = client.tts(
    text="Hello, welcome to Deepdub!",
    voice_prompt_id="your-voice-id",
    model="dd-etts-2.5",
    locale="en-US"
)

with open("output.mp3", "wb") as f:
    f.write(audio_data)

Returns: bytes — binary audio data in the specified format.

Parameters

text

string

required

Text to convert to speech.

voice_prompt_id

string

Voice prompt ID to use. Either this or voice_reference must be provided.

voice_reference

Union[bytes, str, Path]

Audio reference for instant voice cloning. Accepts a file Path, raw bytes, or a base64-encoded string. Either this or voice_prompt_id must be provided.

model

string

default:"dd-etts-2.5"

Model ID. Available models: dd-etts-3.0, dd-etts-2.5.

locale

string

default:"en-US"

Language locale code (e.g., en-US, fr-FR).

format

string

default:"mp3"

Audio output format. REST API supports: mp3, opus, mulaw. WebSocket additionally supports: wav (default), s16le.

temperature

float

Generation temperature (0.0–1.0). Higher values produce more varied output.

variance

float

Voice variation level (0.0–1.0).

duration

float

Target audio duration in seconds. Mutually exclusive with tempo.

tempo

float

Playback speed multiplier. Mutually exclusive with duration.

seed

int

Random seed for deterministic generation.

prompt_boost

bool

Enhance voice prompt characteristics.

sample_rate

int

Output sample rate in Hz. Supported: 8000, 16000, 22050, 24000, 44100, 48000.

accent_base_locale

string

Base accent locale (e.g., en-US). Must be provided together with accent_locale and accent_ratio.

accent_locale

string

Target accent locale (e.g., fr-FR). Must be provided together with accent_base_locale and accent_ratio.

accent_ratio

float

Accent blend ratio (0.0–1.0). Must be provided together with accent_base_locale and accent_locale.

Full example with all parameters

audio_data = client.tts(
    text="This demonstrates all available TTS parameters.",
    voice_prompt_id="your-voice-id",
    model="dd-etts-2.5",
    locale="en-US",
    format="mp3",
    temperature=0.7,
    variance=0.6,
    tempo=1.1,
    seed=42,
    prompt_boost=True,
    sample_rate=44100,
    accent_base_locale="en-US",
    accent_locale="fr-FR",
    accent_ratio=0.3,
)

with open("output.mp3", "wb") as f:
    f.write(audio_data)

Voice cloning from audio reference

from pathlib import Path

audio_data = client.tts(
    text="Cloning a voice from an audio sample.",
    voice_reference=Path("reference_audio.mp3"),
    model="dd-etts-2.5",
    locale="en-US",
)

with open("cloned_output.mp3", "wb") as f:
    f.write(audio_data)

Async / WebSocket TTS

`async_tts()` — Streaming generation

Stream audio chunks over WebSocket for low-latency playback. Must be used within an async_connect() context.

import asyncio
from deepdub import DeepdubClient

client = DeepdubClient(api_key="dd-your-api-key")

async def stream_audio():
    audio_data = bytearray()
    async with client.async_connect() as conn:
        async for chunk in conn.async_tts(
            text="Streaming audio in real time!",
            voice_prompt_id="bd1b00bb-be1c-4679-8eaa-0fcbfd4ff773",
            model="dd-etts-3.0",
            locale="en-US",
            format="wav",
            sample_rate=16000,
        ):
            audio_data.extend(chunk)
            print(f"Received chunk: {len(chunk)} bytes")

    with open("streamed.wav", "wb") as f:
        f.write(audio_data)
    print(f"Total audio: {len(audio_data)} bytes")

asyncio.run(stream_audio())

Yields: bytes — audio chunks as they are generated.

Parameters

Same as tts(), plus:

generation_id

string

Optional UUID for request tracking. Auto-generated if not provided.

target_gender

string

Target gender for the output voice.

verbose

bool

default:"false"

Print debug information about sent/received messages.

Multiple concurrent generations

The WebSocket connection supports multiplexing — run multiple TTS requests on the same connection:

import asyncio
from deepdub import DeepdubClient

client = DeepdubClient(api_key="dd-your-api-key")

async def generate_multiple():
    async with client.async_connect() as conn:
        async def generate_one(text, filename):
            audio = bytearray()
            async for chunk in conn.async_tts(
                text=text,
                voice_prompt_id="bd1b00bb-be1c-4679-8eaa-0fcbfd4ff773",
                model="dd-etts-3.0",
                locale="en-US",
                format="wav",
                sample_rate=16000,
            ):
                audio.extend(chunk)
            with open(filename, "wb") as f:
                f.write(audio)

        await asyncio.gather(
            generate_one("First sentence.", "out1.wav"),
            generate_one("Second sentence.", "out2.wav"),
            generate_one("Third sentence.", "out3.wav"),
        )

asyncio.run(generate_multiple())

Streaming Input

For real-time text streaming (sending text incrementally), use async_stream_connect():

import asyncio
from deepdub import DeepdubClient

client = DeepdubClient(api_key="dd-your-api-key")

async def streaming_input():
    async with client.async_stream_connect(
        model="dd-etts-3.0",
        locale="en-US",
        voice_prompt_id="your-voice-id",
        format="wav",
        sample_rate=16000,
    ) as conn:
        await conn.async_stream_text("Hello, ")
        await conn.async_stream_text("this is streamed ")
        await conn.async_stream_text("text input.")
        await conn.async_stream_end()

        audio_data = bytearray()
        while True:
            audio = await conn.async_stream_recv_audio()
            if audio is None:
                break
            audio_data.extend(audio)
            print(f"Received chunk: {len(audio)} bytes")
        print(f"Total audio: {len(audio_data)} bytes")

asyncio.run(streaming_input())

Gender Classification

Classify the gender of a speaker from an audio sample:

import asyncio
from pathlib import Path
from deepdub import DeepdubClient

client = DeepdubClient(api_key="dd-your-api-key")

async def classify():
    async with client.async_connect() as conn:
        result = await conn.gender_classify(
            audio_data=Path("speaker_sample.wav"),
            sample_rate=16000,
            timeout=5.0,
        )
        print(result)

asyncio.run(classify())

audio_data

Union[bytes, str, Path]

required

Audio data as raw bytes, base64-encoded string, or file Path. Automatically trimmed to 1 second.

sample_rate

int

default:"16000"

Sample rate of the input audio.

timeout

float

default:"5.0"

Timeout in seconds for the WebSocket response.

generation_id

string

Optional UUID for request tracking.

Voice Management

`list_voices()` — List all voice prompts

voices = client.list_voices()

for voice in voices.get("voicePrompts", []):
    print(f"{voice['id']}: {voice.get('name', voice.get('title', 'Untitled'))}")

Returns: dict with a voicePrompts key containing a list of voice prompt objects.

`add_voice()` — Upload a voice sample

from pathlib import Path

response = client.add_voice(
    data=Path("voice_sample.wav"),
    name="Professional Narrator",
    gender="female",
    locale="en-US",
    publish=False,
    speaking_style="Neutral",
    age=30,
)

print(f"Created voice: {response}")

Returns: dict with the created voice prompt information.

Parameters

data

Union[bytes, str, Path]

required

Audio data — a file Path, raw bytes, or base64-encoded string.

name

string

required

Display name for the voice prompt.

gender

string

required

Speaker gender: "male" or "female".

locale

string

required

Language locale code (e.g., en-US).

publish

bool

default:"false"

Whether to make the voice publicly available.

speaking_style

string

default:"Neutral"

Speaking style descriptor.

age

int

default:"0"

Age of the speaker.

CLI Reference

The SDK includes a command-line interface:

# List available voices
deepdub list-voices

# Upload a new voice
deepdub add-voice \
  --file path/to/audio.mp3 \
  --name "My Voice" \
  --gender male \
  --locale en-US

# Generate text-to-speech
deepdub tts \
  --text "Hello from the CLI!" \
  --voice-prompt-id your-voice-id

# Set API key via flag or environment
deepdub --api-key dd-your-key tts --text "Hello!"
export DEEPDUB_API_KEY=dd-your-key

Environment Variables

Variable	Description	Default
`DEEPDUB_API_KEY`	API key for authentication	—
`DEEPDUB_BASE_URL`	REST API base URL	`https://restapi.deepdub.ai/api/v1`
`DEEPDUB_BASE_WEBSOCKET_URL`	WebSocket API base URL	`wss://wsapi.deepdub.ai/open`
`DEEPDUB_BASE_WEBSOCKET_STREAMING_URL`	Streaming WebSocket base URL	`wss://wss.deepdub.ai/ws`
`DD_EU`	Use EU endpoints (`"1"` to enable)	`"0"`

Error Handling

from deepdub import DeepdubClient
import requests

client = DeepdubClient(api_key="dd-your-api-key")

try:
    audio = client.tts(
        text="Hello!",
        voice_prompt_id="your-voice-id",
    )
except requests.exceptions.HTTPError as e:
    if e.response.status_code == 401:
        print("Invalid API key")
    elif e.response.status_code == 400:
        print("Invalid request parameters")
    else:
        print(f"API error: {e}")
except ValueError as e:
    print(f"Validation error: {e}")

For async operations, WebSocket errors are raised as Exception with the error message from the server:

try:
    async with client.async_connect() as conn:
        async for chunk in conn.async_tts(text="Hello!", voice_prompt_id="id"):
            pass
except Exception as e:
    error_msg = str(e)
    # Possible errors: "Rate limit exceeded", "Insufficient credits", etc.
    print(f"WebSocket error: {error_msg}")

Available Models

Model ID	Description
`dd-etts-3.0`	Latest model with best quality
`dd-etts-2.5`	Stable production model (default)

​Installation

​Initialization

​Constructor parameters

​Region endpoints

​Text-to-Speech

​tts() — Synchronous generation

​Parameters

​Full example with all parameters

​Voice cloning from audio reference

​Async / WebSocket TTS

​async_tts() — Streaming generation

​Parameters

​Multiple concurrent generations

​Streaming Input

​Gender Classification

​Voice Management

​list_voices() — List all voice prompts

​add_voice() — Upload a voice sample

​Parameters

​CLI Reference

​Environment Variables

​Error Handling

​Available Models

Installation

Initialization

Constructor parameters

Region endpoints

Text-to-Speech

`tts()` — Synchronous generation

Parameters

Full example with all parameters

Voice cloning from audio reference

Async / WebSocket TTS

`async_tts()` — Streaming generation

Parameters

Multiple concurrent generations

Streaming Input

Gender Classification

Voice Management

`list_voices()` — List all voice prompts

`add_voice()` — Upload a voice sample

Parameters

CLI Reference

Environment Variables

Error Handling

Available Models