GET /v1/app/voices
List voice profiles available to this device: workspace voices created in the dashboard plus clones created by this
client.
- Response
200 — JSON object with a voices array. - Each item:
voice_id, name, category (cloned = this device, workspace = created via dashboard), optional preview_url (time-limited signed URL when storage is configured).
{
"voices": [
{
"voice_id": "550e8400-e29b-41d4-a716-446655440000",
"name": "Bedtime narrator",
"category": "workspace",
"preview_url": "https://..."
}
]
}
POST /v1/app/voices/clone
Create a cloned voice from a reference recording. Audio is transcribed (for the adapter), stored, and synced to the TTS
adapter.
- Content-Type:
multipart/form-data - Form fields:
name (required), description (optional, default empty), audio_file (required file upload). - Audio rules: converted to WAV server-side; duration must be at least 10 seconds and at most 30 seconds.
- Response
201 — voice_id, name, status (e.g. ready), optional preview_url. - Common error codes:
EMPTY_AUDIO, AUDIO_TOO_SHORT, AUDIO_TOO_LONG, TRANSCRIPTION_FAILED, STORAGE_ERROR.
POST /v1/app/voices/design
Create a voice from a text description (no audio upload). The service materializes reference audio and registers the profile.
- Content-Type:
application/json - Response
201 — same shape as clone: AppCloneResponse. - Common error codes:
MISSING_DESIGN_PROMPT, DESIGN_FAILED.
{
"name": "Friendly guide",
"design_prompt": "Warm, calm adult voice suitable for bedtime stories.",
"language": "en",
"description": "Optional notes"
}
POST /v1/app/text-to-speech/{voice_id}/with-timestamps
Synthesize speech with word-level timestamps. The voice must be in the workspace and allowed for this device: either
created by this client (cloned) or created by a user in the workspace (workspace).
- Path parameter:
voice_id — UUID string of the voice profile. - Text limit: up to 15,000 characters.
- Response
200 — JSON with signed audio_url (temporary), expires_in (seconds, typically 300), duration_seconds, timestamps (array of word, start_ms, end_ms), full_text. - Common error codes:
TEXT_TOO_LONG, EMPTY_TEXT, INVALID_VOICE_ID, VOICE_NOT_FOUND, VOICE_NOT_AVAILABLE (403), ADAPTER_SYNC_FAILED, ADAPTER_NOT_CONFIGURED, TTS_FAILED, STORAGE_ERROR.
{
"text": "Hello from ItanniX.",
"output_format": "mp3_44100_128",
"speed": 1.0,
"stability": 0.75
}
output_format may contain mp3 for MP3 or otherwise WAV.
Designed voices may send a voice_clone_prompt path on the adapter; clone voices use profile_id from the adapter.