Resemble AI¶

IndoxHub exposes Resemble AI under a Resemble-native API shape at /api/v1/resemble/*. Auth, billing, rate limits, and persistence are handled transparently — you don't talk to Resemble directly.

Item	Value
Base URL	`https://api.indoxhub.com/api/v1/resemble`
Auth	`Authorization: Bearer YOUR_INDOXHUB_API_KEY`
Pricing model	Pay-per-second of audio (or per-image / per-search depending on capability)
Markup	20 % over Resemble's per-second cost (configurable via `RESEMBLE_MARKUP_PCT`)
Currency	USD
Account plan assumed	Resemble Flex. Voice cloning + voice design require Business plan and are gated behind `RESEMBLE_BUSINESS_PLAN_ACTIVE=true`.

Quick start¶

The only endpoint that touches money on the first call is POST /tts/synthesize. List voices first to discover voice UUIDs (free):

# 1. List your account's voices
curl https://api.indoxhub.com/api/v1/resemble/tts/voices?page=1&page_size=10 \
  -H "Authorization: Bearer $INDOXHUB_API_KEY"

# 2. Synthesize ~$0.0005 / sec of generated audio (plus IndoxHub markup)
curl -X POST https://api.indoxhub.com/api/v1/resemble/tts/synthesize \
  -H "Authorization: Bearer $INDOXHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "voice_uuid": "9fd7430d",
    "text": "Hello from IndoxHub.",
    "output_format": "mp3"
  }'

The response includes the audio (base64), the duration billed, and a request ID you can correlate with provider_resemble_request and provider_resemble_usage rows.

Capability index¶

Group	Path prefix	Endpoints	Billing unit
Text-to-Speech	`/tts`	3	audio seconds
Voices	`/voices`	3	per voice subscription (Business)
Recordings	`/voices/{voice_uuid}/recordings`	5	free (training audio CRUD)
Voice Design	`/voice-design`	2	per voice subscription (Business)
Speech-to-Text	`/stt`	3	audio seconds
Audio Enhance / Edit	`/enhance`, `/edit`	4	audio seconds
Safety: Detect / Intelligence / Tracing	`/detect`, `/intelligence`, `/tracing`	6	audio sec / images
Watermark	`/watermark`	4	audio seconds
Identity	`/identity`	5	per search / per enrollment
Projects & Clips	`/projects`	9	free (CRUD)
Agents	`/agents`, `/agent-tools`, `/agent-webhooks`, `/knowledge-base`	22	free (config CRUD)
Uploads	`/uploads`	1	free
Inbound Webhooks	`/webhooks/resemble`	1	n/a

Text-to-Speech¶

`POST /tts/synthesize`¶

Synchronous text-to-speech. Bills audio_seconds.

Field	Type	Required	Notes
`voice_uuid`	string	yes	From `GET /tts/voices`
`text`	string	yes	Max 10 000 chars
`output_format`	string	no	`mp3` (default) or `wav`
`precision`	string	no	`PCM_16`, `PCM_32`, or `MULAW`
`sample_rate`	int	no	8000–48000
`language`	string	no	ISO-639 code; voice's default if omitted
`speed`	float	no	0.5–2.0

Response:

{
  "request_id": "res_<24-hex>",
  "voice_uuid": "9fd7430d",
  "audio_url": "https://<account>.r2.cloudflarestorage.com/indoxhub-media/tts-output/<user_id>/<YYYY>/<MM>/<DD>/<HHMMSS>-<request_id>.mp3?X-Amz-...",
  "audio_content": "<base64-mp3>",
  "audio_duration": 1.42,
  "expires_at": "2026-05-04T05:18:18.654251+00:00",
  "billing": {
    "unit": "audio_seconds",
    "quantity": 1.42,
    "charged": "0.00085200"
  },
  "raw_response": { ... }
}

audio_url is a presigned Cloudflare R2 URL valid for 1 hour. Prefer it over audio_content for large clips. expires_at is when the R2 object will be auto-deleted (default: 7 days for TTS output). The original Resemble URL, when present, is in resemble_url.

`GET /tts/voices?page=1&page_size=10`¶

List your account's voices. Free. page_size must be 10–1000 (Resemble enforces).

`GET /tts/voices/{voice_uuid}`¶

Get one voice's metadata. Free.

Voices¶

All endpoints in this section require RESEMBLE_BUSINESS_PLAN_ACTIVE=true. Without it, requests return 503 immediately and never reach Resemble (saves quota).

`POST /voices`¶

Create a new voice (cloning). Bills monthly subscription on first build.

Field	Type	Required
`name`	string (≤128)	yes
`consent_text`	string	no
`description`	string	no

`POST /voices/{voice_uuid}/build`¶

Build (train) a voice from its enrolled recordings. Writes a voice_subscriptions usage record.

`DELETE /voices/{voice_uuid}`¶

Delete a voice. Free.

Recordings¶

Training audio nested under a voice. Free CRUD; the audio itself is proxied to Resemble, not stored on IndoxHub.

Method	Path
`GET`	`/voices/{voice_uuid}/recordings?page=&page_size=`
`POST`	`/voices/{voice_uuid}/recordings`
`GET`	`/voices/{voice_uuid}/recordings/{recording_uuid}`
`PUT`	`/voices/{voice_uuid}/recordings/{recording_uuid}`
`DELETE`	`/voices/{voice_uuid}/recordings/{recording_uuid}`

POST/PUT body: {"fields": { ... resemble-native fields ... }}

Voice Design¶

Business-plan gated (same RESEMBLE_BUSINESS_PLAN_ACTIVE flag).

`POST /voice-design/candidates`¶

Generate candidate voices from a text description. Bills 1 unit per generation.

Field	Type	Required	Notes
`description`	string (≤1024)	yes	Plain-English description of the voice
`sample_text`	string	no	Sample to read
`params`	object	no	Resemble-native overrides

`POST /voice-design/{design_uuid}/promote`¶

Promote one candidate into a real voice.

Field	Required	Notes
`candidate_index`	yes	0-based
`name`	yes	≤128 chars

Speech-to-Text¶

`POST /stt`¶

Submit a transcription job. Async — returns a job_id.

Field	Type	Required
`audio_url`	string	yes
`language`	string	no
`callback_uri`	string	no

`GET /stt/{job_id}`¶

Fetch transcript / status.

`GET /stt?page=&page_size=`¶

List jobs.

Audio Enhance & Edit¶

Both follow the same async-job shape: POST to submit, GET /{job_id} to retrieve.

Capability	Submit	Retrieve	Billing
Enhance noisy audio	`POST /enhance`	`GET /enhance/{job_id}`	audio sec
Edit (insert / remove / replace via SSML-like ops)	`POST /edit`	`GET /edit/{job_id}`	audio sec

Submit body shape: {"audio_url": "...", ...resemble fields}. The first GET /{job_id} after completion mirrors the result audio to Cloudflare R2 (lazy + idempotent) and augments the response with audio_url (presigned R2 URL, 7-day retention), expires_at, and resemble_url (upstream fallback). Subsequent GETs return the same R2 URL.

Safety¶

Async deepfake detection + content intelligence + audio source tracing. Same submit/retrieve pattern as audio jobs.

Capability	Submit	Retrieve	Billing
Deepfake detection (audio / video / image)	`POST /detect`	`GET /detect/{job_id}`	audio sec / video sec / per image
Audio + video + image intelligence	`POST /intelligence`	`GET /intelligence/{job_id}`	audio sec / video sec / per image
Audio source tracing	`POST /tracing`	`GET /tracing/{job_id}`	TBD

Watermark¶

Embed and detect inaudible provenance markers in audio. Survives compression and re-recording.

Capability	Submit	Retrieve	Billing
Apply watermark	`POST /watermark/apply`	`GET /watermark/apply/{job_id}`	$0.0005/sec
Detect watermark	`POST /watermark/detect`	`GET /watermark/detect/{job_id}`	$0.0002/sec

The first GET /watermark/apply/{job_id} after completion mirrors the watermarked audio to Cloudflare R2 and adds audio_url, expires_at (7-day retention), and resemble_url to the response. /detect/{job_id} returns a confidence score with no audio output, so no R2 mirror.

Use case: every TTS output can be auto-watermarked at sub-cent cost so it can later be detected as AI-generated.

Identity¶

Beta. Per-search and per-enrollment billing.

Method	Path	Notes
`POST`	`/identity/search`	Search known identities by audio sample. Bills 1 search.
`POST`	`/identity/enroll`	Enroll a new identity. Bills 1 enrollment.
`GET`	`/identity?page=&page_size=`	List enrolled identities. Free.
`GET`	`/identity/{identity_id}`	Get one. Free.
`DELETE`	`/identity/{identity_id}`	Delete. Free.

Projects & Clips¶

Pure metadata proxying. No billing.

Projects: GET, POST, GET /{project_uuid}, PUT /{project_uuid}, DELETE /{project_uuid}.

Clips (nested):

Method	Path
`GET`	`/projects/{project_uuid}/clips?page=&page_size=`
`POST`	`/projects/{project_uuid}/clips`
`GET`	`/projects/{project_uuid}/clips/{clip_uuid}`
`DELETE`	`/projects/{project_uuid}/clips/{clip_uuid}`

Agents¶

Resemble's voice-agents platform. Free CRUD on configs; live agent dispatch (Twilio integration) is not exposed by IndoxHub.

Resource	Prefix	Methods
Agents	`/agents`	full CRUD + `GET /agents/capabilities` + `GET /agents/system-tools`
Agent tools	`/agent-tools`	full CRUD
Agent webhooks	`/agent-webhooks`	full CRUD
Knowledge base	`/knowledge-base`	full CRUD

CRUD shape (same for all four):

Method	Path	Body
`GET`	`""?page=&page_size=`	—
`POST`	`""`	`{"fields": {...}}`
`GET`	`"/{uuid}"`	—
`PUT`	`"/{uuid}"`	`{"fields": {...}}`
`DELETE`	`"/{uuid}"`	—

The fields envelope is forwarded to Resemble unchanged — consult Resemble's agent docs for valid keys.

Uploads¶

Single endpoint for uploading user-supplied media to Cloudflare R2 and getting back a presigned URL to pass into Resemble async jobs.

POST /uploads — multipart multipart/form-data. Body fields:

Field	Type	Required	Notes
`file`	file	yes	Audio/video/image. Max 500 MiB. Allowed: mp3, wav, m4a, flac, ogg, webm, mp4, mov, mkv, png, jpg, jpeg, webp
`purpose`	string	no	Retention hint (see table below). Default: 30-day generic uploads

`purpose` value	R2 prefix	Retention
`voice_clone`	`voice-recordings/`	PERMANENT (identity asset)
`stt_input`	`stt-input/`	30 days
`watermark_input`	`watermark/`	7 days
`audio_job_input`	`audio-jobs/`	7 days
(unspecified)	`uploads/`	30 days

Response:

{
  "url": "https://<account>.r2.cloudflarestorage.com/indoxhub-media/<prefix>/<user_id>/<YYYY>/<MM>/<DD>/<HHMMSS>-<id>.<ext>?X-Amz-...",
  "asset_id": 1234,
  "asset_class": "voice_recordings",
  "purpose": "voice_clone",
  "expires_at": null,
  "expires_in": 3600,
  "file_name": "voice.wav",
  "file_type": "audio",
  "extension": "wav",
  "size_bytes": 89211
}

expires_at is null for permanent assets (voice clones), otherwise an ISO 8601 timestamp. Use the returned url directly as the audio_url argument to STT, enhance, edit, detect, intelligence, watermark, and identity endpoints — Resemble can fetch from R2 for the 1-hour signing window.

Voice clone training example:

curl https://api.indoxhub.com/api/v1/resemble/uploads \
  -H "Authorization: Bearer $INDOXHUB_API_KEY" \
  -F "[email protected]" \
  -F "purpose=voice_clone"

Inbound Webhooks¶

Resemble notifies IndoxHub when async jobs complete and when voice training finishes.

POST /webhooks/resemble — IndoxHub-side endpoint. Verifies HMAC signature using RESEMBLE_WEBHOOK_SECRET, persists the event in provider_resemble_webhook_event, and updates the matching provider_resemble_job row.

Configure the webhook URL in your Resemble dashboard. Header: X-Resemble-Signature: sha256=<hex> over the raw request body.

Errors¶

All Resemble errors map to standard HTTP codes:

Status	Meaning	Source
`400`	Resemble rejected the request (bad params, validation)	upstream
`401`	IndoxHub API key missing / invalid	IndoxHub
`429`	Rate limit hit (per-user, per-capability)	IndoxHub or upstream
`502`	Resemble auth failed or Resemble 5xx	upstream
`503`	Resemble integration not configured, or capability gated by `RESEMBLE_BUSINESS_PLAN_ACTIVE`	IndoxHub

Error body shape: {"detail": "<message>"}. Network failures, timeouts (120 s), and connection errors all surface as 502 with "network error contacting Resemble".

Billing & observability¶

Every successful request writes one row to provider_resemble_request (audit) and one to provider_resemble_usage (billing). Async jobs additionally write to provider_resemble_job. Generated audio URLs land in provider_resemble_asset.

A nightly Celery task (reconcile-resemble-billing) calls GET /account/billing_usage against Resemble and compares totals to provider_resemble_usage. Drifts >1 % are logged at ERROR level.

Per-user, per-capability rate limits live in app/utils/resemble_rate_limit.py::_LIMITS. Hitting a limit returns 429 Retry-After. Limits degrade open if Redis is unavailable.

Configuration reference¶

Env var	Default	Purpose
`RESEMBLE_API_KEY`	—	Required. Your Resemble API key.
`RESEMBLE_BASE_URL`	`https://app.resemble.ai/api/v2`	Main API. Bearer auth.
`RESEMBLE_SYNTH_URL`	`https://f.cluster.resemble.ai`	Synthesis cluster. Token auth (auto-handled).
`RESEMBLE_MARKUP_PCT`	`20.0`	Percent added on top of Resemble cost.
`RESEMBLE_WEBHOOK_SECRET`	—	HMAC secret. Webhooks accepted unsigned if unset (dev only).
`RESEMBLE_BUSINESS_PLAN_ACTIVE`	`false`	Gate for voice cloning + voice design.

Not exposed (by design)¶

WebSocket streaming TTS — Resemble Business-plan only.
Live agent dispatch (POST /agents/{uuid}/dispatch) — requires Twilio + real-time voice infrastructure outside IndoxHub's gateway scope.
Agent phone numbers (/agents/phone-numbers/*) — same reason.
OpenAI-compatible TTS shape (POST /audio/speech) for Resemble — see docs/usage/tts.md for the OpenAI-shaped TTS surface (currently OpenAI-only; Resemble adapter is a future task).

See docs/reference/resemble/decisions.md for why specific defaults were chosen (markup, storage policy, BYOK, rate-limit fairness, etc.) and docs/reference/resemble/business-plan.md for which routes require a Resemble Business plan upgrade.