Skip to content

Resemble AI

IndoxHub exposes Resemble AI under a Resemble-native API shape at /api/v1/resemble/*. Auth, billing, rate limits, and persistence are handled transparently — you don't talk to Resemble directly.

Item Value
Base URL https://api.indoxhub.com/api/v1/resemble
Auth Authorization: Bearer YOUR_INDOXHUB_API_KEY
Pricing model Pay-per-second of audio (or per-image / per-search depending on capability)
Markup 20 % over Resemble's per-second cost (configurable via RESEMBLE_MARKUP_PCT)
Currency USD
Account plan assumed Resemble Flex. Voice cloning + voice design require Business plan and are gated behind RESEMBLE_BUSINESS_PLAN_ACTIVE=true.

Quick start

The only endpoint that touches money on the first call is POST /tts/synthesize. List voices first to discover voice UUIDs (free):

# 1. List your account's voices
curl https://api.indoxhub.com/api/v1/resemble/tts/voices?page=1&page_size=10 \
  -H "Authorization: Bearer $INDOXHUB_API_KEY"

# 2. Synthesize ~$0.0005 / sec of generated audio (plus IndoxHub markup)
curl -X POST https://api.indoxhub.com/api/v1/resemble/tts/synthesize \
  -H "Authorization: Bearer $INDOXHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "voice_uuid": "9fd7430d",
    "text": "Hello from IndoxHub.",
    "output_format": "mp3"
  }'

The response includes the audio (base64), the duration billed, and a request ID you can correlate with provider_resemble_request and provider_resemble_usage rows.


Capability index

Group Path prefix Endpoints Billing unit
Text-to-Speech /tts 3 audio seconds
Voices /voices 3 per voice subscription (Business)
Recordings /voices/{voice_uuid}/recordings 5 free (training audio CRUD)
Voice Design /voice-design 2 per voice subscription (Business)
Speech-to-Text /stt 3 audio seconds
Audio Enhance / Edit /enhance, /edit 4 audio seconds
Safety: Detect / Intelligence / Tracing /detect, /intelligence, /tracing 6 audio sec / images
Watermark /watermark 4 audio seconds
Identity /identity 5 per search / per enrollment
Projects & Clips /projects 9 free (CRUD)
Agents /agents, /agent-tools, /agent-webhooks, /knowledge-base 22 free (config CRUD)
Uploads /uploads 1 free
Inbound Webhooks /webhooks/resemble 1 n/a

Text-to-Speech

POST /tts/synthesize

Synchronous text-to-speech. Bills audio_seconds.

Field Type Required Notes
voice_uuid string yes From GET /tts/voices
text string yes Max 10 000 chars
output_format string no mp3 (default) or wav
precision string no PCM_16, PCM_32, or MULAW
sample_rate int no 8000–48000
language string no ISO-639 code; voice's default if omitted
speed float no 0.5–2.0

Response:

{
  "request_id": "res_<24-hex>",
  "voice_uuid": "9fd7430d",
  "audio_url": "https://<account>.r2.cloudflarestorage.com/indoxhub-media/tts-output/<user_id>/<YYYY>/<MM>/<DD>/<HHMMSS>-<request_id>.mp3?X-Amz-...",
  "audio_content": "<base64-mp3>",
  "audio_duration": 1.42,
  "expires_at": "2026-05-04T05:18:18.654251+00:00",
  "billing": {
    "unit": "audio_seconds",
    "quantity": 1.42,
    "charged": "0.00085200"
  },
  "raw_response": { ... }
}

audio_url is a presigned Cloudflare R2 URL valid for 1 hour. Prefer it over audio_content for large clips. expires_at is when the R2 object will be auto-deleted (default: 7 days for TTS output). The original Resemble URL, when present, is in resemble_url.

GET /tts/voices?page=1&page_size=10

List your account's voices. Free. page_size must be 10–1000 (Resemble enforces).

GET /tts/voices/{voice_uuid}

Get one voice's metadata. Free.


Voices

All endpoints in this section require RESEMBLE_BUSINESS_PLAN_ACTIVE=true. Without it, requests return 503 immediately and never reach Resemble (saves quota).

POST /voices

Create a new voice (cloning). Bills monthly subscription on first build.

Field Type Required
name string (≤128) yes
consent_text string no
description string no

POST /voices/{voice_uuid}/build

Build (train) a voice from its enrolled recordings. Writes a voice_subscriptions usage record.

DELETE /voices/{voice_uuid}

Delete a voice. Free.


Recordings

Training audio nested under a voice. Free CRUD; the audio itself is proxied to Resemble, not stored on IndoxHub.

Method Path
GET /voices/{voice_uuid}/recordings?page=&page_size=
POST /voices/{voice_uuid}/recordings
GET /voices/{voice_uuid}/recordings/{recording_uuid}
PUT /voices/{voice_uuid}/recordings/{recording_uuid}
DELETE /voices/{voice_uuid}/recordings/{recording_uuid}

POST/PUT body: {"fields": { ... resemble-native fields ... }}


Voice Design

Business-plan gated (same RESEMBLE_BUSINESS_PLAN_ACTIVE flag).

POST /voice-design/candidates

Generate candidate voices from a text description. Bills 1 unit per generation.

Field Type Required Notes
description string (≤1024) yes Plain-English description of the voice
sample_text string no Sample to read
params object no Resemble-native overrides

POST /voice-design/{design_uuid}/promote

Promote one candidate into a real voice.

Field Required Notes
candidate_index yes 0-based
name yes ≤128 chars

Speech-to-Text

POST /stt

Submit a transcription job. Async — returns a job_id.

Field Type Required
audio_url string yes
language string no
callback_uri string no

GET /stt/{job_id}

Fetch transcript / status.

GET /stt?page=&page_size=

List jobs.


Audio Enhance & Edit

Both follow the same async-job shape: POST to submit, GET /{job_id} to retrieve.

Capability Submit Retrieve Billing
Enhance noisy audio POST /enhance GET /enhance/{job_id} audio sec
Edit (insert / remove / replace via SSML-like ops) POST /edit GET /edit/{job_id} audio sec

Submit body shape: {"audio_url": "...", ...resemble fields}. The first GET /{job_id} after completion mirrors the result audio to Cloudflare R2 (lazy + idempotent) and augments the response with audio_url (presigned R2 URL, 7-day retention), expires_at, and resemble_url (upstream fallback). Subsequent GETs return the same R2 URL.


Safety

Async deepfake detection + content intelligence + audio source tracing. Same submit/retrieve pattern as audio jobs.

Capability Submit Retrieve Billing
Deepfake detection (audio / video / image) POST /detect GET /detect/{job_id} audio sec / video sec / per image
Audio + video + image intelligence POST /intelligence GET /intelligence/{job_id} audio sec / video sec / per image
Audio source tracing POST /tracing GET /tracing/{job_id} TBD

Watermark

Embed and detect inaudible provenance markers in audio. Survives compression and re-recording.

Capability Submit Retrieve Billing
Apply watermark POST /watermark/apply GET /watermark/apply/{job_id} $0.0005/sec
Detect watermark POST /watermark/detect GET /watermark/detect/{job_id} $0.0002/sec

The first GET /watermark/apply/{job_id} after completion mirrors the watermarked audio to Cloudflare R2 and adds audio_url, expires_at (7-day retention), and resemble_url to the response. /detect/{job_id} returns a confidence score with no audio output, so no R2 mirror.

Use case: every TTS output can be auto-watermarked at sub-cent cost so it can later be detected as AI-generated.


Identity

Beta. Per-search and per-enrollment billing.

Method Path Notes
POST /identity/search Search known identities by audio sample. Bills 1 search.
POST /identity/enroll Enroll a new identity. Bills 1 enrollment.
GET /identity?page=&page_size= List enrolled identities. Free.
GET /identity/{identity_id} Get one. Free.
DELETE /identity/{identity_id} Delete. Free.

Projects & Clips

Pure metadata proxying. No billing.

Projects: GET, POST, GET /{project_uuid}, PUT /{project_uuid}, DELETE /{project_uuid}.

Clips (nested):

Method Path
GET /projects/{project_uuid}/clips?page=&page_size=
POST /projects/{project_uuid}/clips
GET /projects/{project_uuid}/clips/{clip_uuid}
DELETE /projects/{project_uuid}/clips/{clip_uuid}

Agents

Resemble's voice-agents platform. Free CRUD on configs; live agent dispatch (Twilio integration) is not exposed by IndoxHub.

Resource Prefix Methods
Agents /agents full CRUD + GET /agents/capabilities + GET /agents/system-tools
Agent tools /agent-tools full CRUD
Agent webhooks /agent-webhooks full CRUD
Knowledge base /knowledge-base full CRUD

CRUD shape (same for all four):

Method Path Body
GET ""?page=&page_size=
POST "" {"fields": {...}}
GET "/{uuid}"
PUT "/{uuid}" {"fields": {...}}
DELETE "/{uuid}"

The fields envelope is forwarded to Resemble unchanged — consult Resemble's agent docs for valid keys.


Uploads

Single endpoint for uploading user-supplied media to Cloudflare R2 and getting back a presigned URL to pass into Resemble async jobs.

POST /uploads — multipart multipart/form-data. Body fields:

Field Type Required Notes
file file yes Audio/video/image. Max 500 MiB. Allowed: mp3, wav, m4a, flac, ogg, webm, mp4, mov, mkv, png, jpg, jpeg, webp
purpose string no Retention hint (see table below). Default: 30-day generic uploads
purpose value R2 prefix Retention
voice_clone voice-recordings/ PERMANENT (identity asset)
stt_input stt-input/ 30 days
watermark_input watermark/ 7 days
audio_job_input audio-jobs/ 7 days
(unspecified) uploads/ 30 days

Response:

{
  "url": "https://<account>.r2.cloudflarestorage.com/indoxhub-media/<prefix>/<user_id>/<YYYY>/<MM>/<DD>/<HHMMSS>-<id>.<ext>?X-Amz-...",
  "asset_id": 1234,
  "asset_class": "voice_recordings",
  "purpose": "voice_clone",
  "expires_at": null,
  "expires_in": 3600,
  "file_name": "voice.wav",
  "file_type": "audio",
  "extension": "wav",
  "size_bytes": 89211
}

expires_at is null for permanent assets (voice clones), otherwise an ISO 8601 timestamp. Use the returned url directly as the audio_url argument to STT, enhance, edit, detect, intelligence, watermark, and identity endpoints — Resemble can fetch from R2 for the 1-hour signing window.

Voice clone training example:

curl https://api.indoxhub.com/api/v1/resemble/uploads \
  -H "Authorization: Bearer $INDOXHUB_API_KEY" \
  -F "[email protected]" \
  -F "purpose=voice_clone"


Inbound Webhooks

Resemble notifies IndoxHub when async jobs complete and when voice training finishes.

POST /webhooks/resemble — IndoxHub-side endpoint. Verifies HMAC signature using RESEMBLE_WEBHOOK_SECRET, persists the event in provider_resemble_webhook_event, and updates the matching provider_resemble_job row.

Configure the webhook URL in your Resemble dashboard. Header: X-Resemble-Signature: sha256=<hex> over the raw request body.


Errors

All Resemble errors map to standard HTTP codes:

Status Meaning Source
400 Resemble rejected the request (bad params, validation) upstream
401 IndoxHub API key missing / invalid IndoxHub
429 Rate limit hit (per-user, per-capability) IndoxHub or upstream
502 Resemble auth failed or Resemble 5xx upstream
503 Resemble integration not configured, or capability gated by RESEMBLE_BUSINESS_PLAN_ACTIVE IndoxHub

Error body shape: {"detail": "<message>"}. Network failures, timeouts (120 s), and connection errors all surface as 502 with "network error contacting Resemble".


Billing & observability

Every successful request writes one row to provider_resemble_request (audit) and one to provider_resemble_usage (billing). Async jobs additionally write to provider_resemble_job. Generated audio URLs land in provider_resemble_asset.

A nightly Celery task (reconcile-resemble-billing) calls GET /account/billing_usage against Resemble and compares totals to provider_resemble_usage. Drifts >1 % are logged at ERROR level.

Per-user, per-capability rate limits live in app/utils/resemble_rate_limit.py::_LIMITS. Hitting a limit returns 429 Retry-After. Limits degrade open if Redis is unavailable.


Configuration reference

Env var Default Purpose
RESEMBLE_API_KEY Required. Your Resemble API key.
RESEMBLE_BASE_URL https://app.resemble.ai/api/v2 Main API. Bearer auth.
RESEMBLE_SYNTH_URL https://f.cluster.resemble.ai Synthesis cluster. Token auth (auto-handled).
RESEMBLE_MARKUP_PCT 20.0 Percent added on top of Resemble cost.
RESEMBLE_WEBHOOK_SECRET HMAC secret. Webhooks accepted unsigned if unset (dev only).
RESEMBLE_BUSINESS_PLAN_ACTIVE false Gate for voice cloning + voice design.

Not exposed (by design)

  • WebSocket streaming TTS — Resemble Business-plan only.
  • Live agent dispatch (POST /agents/{uuid}/dispatch) — requires Twilio + real-time voice infrastructure outside IndoxHub's gateway scope.
  • Agent phone numbers (/agents/phone-numbers/*) — same reason.
  • OpenAI-compatible TTS shape (POST /audio/speech) for Resemble — see docs/usage/tts.md for the OpenAI-shaped TTS surface (currently OpenAI-only; Resemble adapter is a future task).

See docs/reference/resemble/decisions.md for why specific defaults were chosen (markup, storage policy, BYOK, rate-limit fairness, etc.) and docs/reference/resemble/business-plan.md for which routes require a Resemble Business plan upgrade.

Documentation last built on May 23, 2026