Resemble AI¶
IndoxHub exposes Resemble AI under a Resemble-native API shape at /api/v1/resemble/*. Auth, billing, rate limits, and persistence are handled transparently — you don't talk to Resemble directly.
| Item | Value |
|---|---|
| Base URL | https://api.indoxhub.com/api/v1/resemble |
| Auth | Authorization: Bearer YOUR_INDOXHUB_API_KEY |
| Pricing model | Pay-per-second of audio (or per-image / per-search depending on capability) |
| Markup | 20 % over Resemble's per-second cost (configurable via RESEMBLE_MARKUP_PCT) |
| Currency | USD |
| Account plan assumed | Resemble Flex. Voice cloning + voice design require Business plan and are gated behind RESEMBLE_BUSINESS_PLAN_ACTIVE=true. |
Quick start¶
The only endpoint that touches money on the first call is POST /tts/synthesize. List voices first to discover voice UUIDs (free):
# 1. List your account's voices
curl https://api.indoxhub.com/api/v1/resemble/tts/voices?page=1&page_size=10 \
-H "Authorization: Bearer $INDOXHUB_API_KEY"
# 2. Synthesize ~$0.0005 / sec of generated audio (plus IndoxHub markup)
curl -X POST https://api.indoxhub.com/api/v1/resemble/tts/synthesize \
-H "Authorization: Bearer $INDOXHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"voice_uuid": "9fd7430d",
"text": "Hello from IndoxHub.",
"output_format": "mp3"
}'
The response includes the audio (base64), the duration billed, and a request ID you can correlate with provider_resemble_request and provider_resemble_usage rows.
Capability index¶
| Group | Path prefix | Endpoints | Billing unit |
|---|---|---|---|
| Text-to-Speech | /tts |
3 | audio seconds |
| Voices | /voices |
3 | per voice subscription (Business) |
| Recordings | /voices/{voice_uuid}/recordings |
5 | free (training audio CRUD) |
| Voice Design | /voice-design |
2 | per voice subscription (Business) |
| Speech-to-Text | /stt |
3 | audio seconds |
| Audio Enhance / Edit | /enhance, /edit |
4 | audio seconds |
| Safety: Detect / Intelligence / Tracing | /detect, /intelligence, /tracing |
6 | audio sec / images |
| Watermark | /watermark |
4 | audio seconds |
| Identity | /identity |
5 | per search / per enrollment |
| Projects & Clips | /projects |
9 | free (CRUD) |
| Agents | /agents, /agent-tools, /agent-webhooks, /knowledge-base |
22 | free (config CRUD) |
| Uploads | /uploads |
1 | free |
| Inbound Webhooks | /webhooks/resemble |
1 | n/a |
Text-to-Speech¶
POST /tts/synthesize¶
Synchronous text-to-speech. Bills audio_seconds.
| Field | Type | Required | Notes |
|---|---|---|---|
voice_uuid |
string | yes | From GET /tts/voices |
text |
string | yes | Max 10 000 chars |
output_format |
string | no | mp3 (default) or wav |
precision |
string | no | PCM_16, PCM_32, or MULAW |
sample_rate |
int | no | 8000–48000 |
language |
string | no | ISO-639 code; voice's default if omitted |
speed |
float | no | 0.5–2.0 |
Response:
{
"request_id": "res_<24-hex>",
"voice_uuid": "9fd7430d",
"audio_url": "https://<account>.r2.cloudflarestorage.com/indoxhub-media/tts-output/<user_id>/<YYYY>/<MM>/<DD>/<HHMMSS>-<request_id>.mp3?X-Amz-...",
"audio_content": "<base64-mp3>",
"audio_duration": 1.42,
"expires_at": "2026-05-04T05:18:18.654251+00:00",
"billing": {
"unit": "audio_seconds",
"quantity": 1.42,
"charged": "0.00085200"
},
"raw_response": { ... }
}
audio_url is a presigned Cloudflare R2 URL valid for 1 hour. Prefer it over audio_content for large clips. expires_at is when the R2 object will be auto-deleted (default: 7 days for TTS output). The original Resemble URL, when present, is in resemble_url.
GET /tts/voices?page=1&page_size=10¶
List your account's voices. Free. page_size must be 10–1000 (Resemble enforces).
GET /tts/voices/{voice_uuid}¶
Get one voice's metadata. Free.
Voices¶
All endpoints in this section require RESEMBLE_BUSINESS_PLAN_ACTIVE=true. Without it, requests return 503 immediately and never reach Resemble (saves quota).
POST /voices¶
Create a new voice (cloning). Bills monthly subscription on first build.
| Field | Type | Required |
|---|---|---|
name |
string (≤128) | yes |
consent_text |
string | no |
description |
string | no |
POST /voices/{voice_uuid}/build¶
Build (train) a voice from its enrolled recordings. Writes a voice_subscriptions usage record.
DELETE /voices/{voice_uuid}¶
Delete a voice. Free.
Recordings¶
Training audio nested under a voice. Free CRUD; the audio itself is proxied to Resemble, not stored on IndoxHub.
| Method | Path |
|---|---|
GET |
/voices/{voice_uuid}/recordings?page=&page_size= |
POST |
/voices/{voice_uuid}/recordings |
GET |
/voices/{voice_uuid}/recordings/{recording_uuid} |
PUT |
/voices/{voice_uuid}/recordings/{recording_uuid} |
DELETE |
/voices/{voice_uuid}/recordings/{recording_uuid} |
POST/PUT body: {"fields": { ... resemble-native fields ... }}
Voice Design¶
Business-plan gated (same RESEMBLE_BUSINESS_PLAN_ACTIVE flag).
POST /voice-design/candidates¶
Generate candidate voices from a text description. Bills 1 unit per generation.
| Field | Type | Required | Notes |
|---|---|---|---|
description |
string (≤1024) | yes | Plain-English description of the voice |
sample_text |
string | no | Sample to read |
params |
object | no | Resemble-native overrides |
POST /voice-design/{design_uuid}/promote¶
Promote one candidate into a real voice.
| Field | Required | Notes |
|---|---|---|
candidate_index |
yes | 0-based |
name |
yes | ≤128 chars |
Speech-to-Text¶
POST /stt¶
Submit a transcription job. Async — returns a job_id.
| Field | Type | Required |
|---|---|---|
audio_url |
string | yes |
language |
string | no |
callback_uri |
string | no |
GET /stt/{job_id}¶
Fetch transcript / status.
GET /stt?page=&page_size=¶
List jobs.
Audio Enhance & Edit¶
Both follow the same async-job shape: POST to submit, GET /{job_id} to retrieve.
| Capability | Submit | Retrieve | Billing |
|---|---|---|---|
| Enhance noisy audio | POST /enhance |
GET /enhance/{job_id} |
audio sec |
| Edit (insert / remove / replace via SSML-like ops) | POST /edit |
GET /edit/{job_id} |
audio sec |
Submit body shape: {"audio_url": "...", ...resemble fields}. The first GET /{job_id} after completion mirrors the result audio to Cloudflare R2 (lazy + idempotent) and augments the response with audio_url (presigned R2 URL, 7-day retention), expires_at, and resemble_url (upstream fallback). Subsequent GETs return the same R2 URL.
Safety¶
Async deepfake detection + content intelligence + audio source tracing. Same submit/retrieve pattern as audio jobs.
| Capability | Submit | Retrieve | Billing |
|---|---|---|---|
| Deepfake detection (audio / video / image) | POST /detect |
GET /detect/{job_id} |
audio sec / video sec / per image |
| Audio + video + image intelligence | POST /intelligence |
GET /intelligence/{job_id} |
audio sec / video sec / per image |
| Audio source tracing | POST /tracing |
GET /tracing/{job_id} |
TBD |
Watermark¶
Embed and detect inaudible provenance markers in audio. Survives compression and re-recording.
| Capability | Submit | Retrieve | Billing |
|---|---|---|---|
| Apply watermark | POST /watermark/apply |
GET /watermark/apply/{job_id} |
$0.0005/sec |
| Detect watermark | POST /watermark/detect |
GET /watermark/detect/{job_id} |
$0.0002/sec |
The first GET /watermark/apply/{job_id} after completion mirrors the watermarked audio to Cloudflare R2 and adds audio_url, expires_at (7-day retention), and resemble_url to the response. /detect/{job_id} returns a confidence score with no audio output, so no R2 mirror.
Use case: every TTS output can be auto-watermarked at sub-cent cost so it can later be detected as AI-generated.
Identity¶
Beta. Per-search and per-enrollment billing.
| Method | Path | Notes |
|---|---|---|
POST |
/identity/search |
Search known identities by audio sample. Bills 1 search. |
POST |
/identity/enroll |
Enroll a new identity. Bills 1 enrollment. |
GET |
/identity?page=&page_size= |
List enrolled identities. Free. |
GET |
/identity/{identity_id} |
Get one. Free. |
DELETE |
/identity/{identity_id} |
Delete. Free. |
Projects & Clips¶
Pure metadata proxying. No billing.
Projects: GET, POST, GET /{project_uuid}, PUT /{project_uuid}, DELETE /{project_uuid}.
Clips (nested):
| Method | Path |
|---|---|
GET |
/projects/{project_uuid}/clips?page=&page_size= |
POST |
/projects/{project_uuid}/clips |
GET |
/projects/{project_uuid}/clips/{clip_uuid} |
DELETE |
/projects/{project_uuid}/clips/{clip_uuid} |
Agents¶
Resemble's voice-agents platform. Free CRUD on configs; live agent dispatch (Twilio integration) is not exposed by IndoxHub.
| Resource | Prefix | Methods |
|---|---|---|
| Agents | /agents |
full CRUD + GET /agents/capabilities + GET /agents/system-tools |
| Agent tools | /agent-tools |
full CRUD |
| Agent webhooks | /agent-webhooks |
full CRUD |
| Knowledge base | /knowledge-base |
full CRUD |
CRUD shape (same for all four):
| Method | Path | Body |
|---|---|---|
GET |
""?page=&page_size= |
— |
POST |
"" |
{"fields": {...}} |
GET |
"/{uuid}" |
— |
PUT |
"/{uuid}" |
{"fields": {...}} |
DELETE |
"/{uuid}" |
— |
The fields envelope is forwarded to Resemble unchanged — consult Resemble's agent docs for valid keys.
Uploads¶
Single endpoint for uploading user-supplied media to Cloudflare R2 and getting back a presigned URL to pass into Resemble async jobs.
POST /uploads — multipart multipart/form-data. Body fields:
| Field | Type | Required | Notes |
|---|---|---|---|
file |
file | yes | Audio/video/image. Max 500 MiB. Allowed: mp3, wav, m4a, flac, ogg, webm, mp4, mov, mkv, png, jpg, jpeg, webp |
purpose |
string | no | Retention hint (see table below). Default: 30-day generic uploads |
purpose value |
R2 prefix | Retention |
|---|---|---|
voice_clone |
voice-recordings/ |
PERMANENT (identity asset) |
stt_input |
stt-input/ |
30 days |
watermark_input |
watermark/ |
7 days |
audio_job_input |
audio-jobs/ |
7 days |
| (unspecified) | uploads/ |
30 days |
Response:
{
"url": "https://<account>.r2.cloudflarestorage.com/indoxhub-media/<prefix>/<user_id>/<YYYY>/<MM>/<DD>/<HHMMSS>-<id>.<ext>?X-Amz-...",
"asset_id": 1234,
"asset_class": "voice_recordings",
"purpose": "voice_clone",
"expires_at": null,
"expires_in": 3600,
"file_name": "voice.wav",
"file_type": "audio",
"extension": "wav",
"size_bytes": 89211
}
expires_at is null for permanent assets (voice clones), otherwise an ISO 8601 timestamp. Use the returned url directly as the audio_url argument to STT, enhance, edit, detect, intelligence, watermark, and identity endpoints — Resemble can fetch from R2 for the 1-hour signing window.
Voice clone training example:
curl https://api.indoxhub.com/api/v1/resemble/uploads \
-H "Authorization: Bearer $INDOXHUB_API_KEY" \
-F "[email protected]" \
-F "purpose=voice_clone"
Inbound Webhooks¶
Resemble notifies IndoxHub when async jobs complete and when voice training finishes.
POST /webhooks/resemble — IndoxHub-side endpoint. Verifies HMAC signature using RESEMBLE_WEBHOOK_SECRET, persists the event in provider_resemble_webhook_event, and updates the matching provider_resemble_job row.
Configure the webhook URL in your Resemble dashboard. Header: X-Resemble-Signature: sha256=<hex> over the raw request body.
Errors¶
All Resemble errors map to standard HTTP codes:
| Status | Meaning | Source |
|---|---|---|
400 |
Resemble rejected the request (bad params, validation) | upstream |
401 |
IndoxHub API key missing / invalid | IndoxHub |
429 |
Rate limit hit (per-user, per-capability) | IndoxHub or upstream |
502 |
Resemble auth failed or Resemble 5xx | upstream |
503 |
Resemble integration not configured, or capability gated by RESEMBLE_BUSINESS_PLAN_ACTIVE |
IndoxHub |
Error body shape: {"detail": "<message>"}. Network failures, timeouts (120 s), and connection errors all surface as 502 with "network error contacting Resemble".
Billing & observability¶
Every successful request writes one row to provider_resemble_request (audit) and one to provider_resemble_usage (billing). Async jobs additionally write to provider_resemble_job. Generated audio URLs land in provider_resemble_asset.
A nightly Celery task (reconcile-resemble-billing) calls GET /account/billing_usage against Resemble and compares totals to provider_resemble_usage. Drifts >1 % are logged at ERROR level.
Per-user, per-capability rate limits live in app/utils/resemble_rate_limit.py::_LIMITS. Hitting a limit returns 429 Retry-After. Limits degrade open if Redis is unavailable.
Configuration reference¶
| Env var | Default | Purpose |
|---|---|---|
RESEMBLE_API_KEY |
— | Required. Your Resemble API key. |
RESEMBLE_BASE_URL |
https://app.resemble.ai/api/v2 |
Main API. Bearer auth. |
RESEMBLE_SYNTH_URL |
https://f.cluster.resemble.ai |
Synthesis cluster. Token auth (auto-handled). |
RESEMBLE_MARKUP_PCT |
20.0 |
Percent added on top of Resemble cost. |
RESEMBLE_WEBHOOK_SECRET |
— | HMAC secret. Webhooks accepted unsigned if unset (dev only). |
RESEMBLE_BUSINESS_PLAN_ACTIVE |
false |
Gate for voice cloning + voice design. |
Not exposed (by design)¶
- WebSocket streaming TTS — Resemble Business-plan only.
- Live agent dispatch (
POST /agents/{uuid}/dispatch) — requires Twilio + real-time voice infrastructure outside IndoxHub's gateway scope. - Agent phone numbers (
/agents/phone-numbers/*) — same reason. - OpenAI-compatible TTS shape (
POST /audio/speech) for Resemble — seedocs/usage/tts.mdfor the OpenAI-shaped TTS surface (currently OpenAI-only; Resemble adapter is a future task).
See docs/reference/resemble/decisions.md for why specific defaults were chosen (markup, storage policy, BYOK, rate-limit fairness, etc.) and docs/reference/resemble/business-plan.md for which routes require a Resemble Business plan upgrade.