Resemble AI — Open-Question Decisions (implementation defaults)¶

Date: 2026-04-19. Context: Captures the defaults chosen for the 8 open questions raised during initial Resemble integration scoping. Each entry is marked REVISIT with the concrete trigger that would cause us to reopen the decision. The original integration / production-readiness planning docs have been removed now that the work is complete; this file is the durable record of "why we picked X".

Q1 — Markup % on top of Resemble's per-second cost¶

Default implemented: RESEMBLE_MARKUP_PCT = 20.0 (settable via env). Charge formula: charged_amount = provider_cost × (1 + markup/100).

Why 20 %: Middle of the range we use elsewhere (OpenAI reseller markup is 10–30 %). Covers reconciliation drift + S3 egress + support overhead.

REVISIT when: Finance signs off on a final rate card, or when we move a capability to Business-plan pricing.

Q2 — Billing schema migration¶

Default implemented: NEW dedicated tables (provider_resemble_usage, _request, _job, _asset, _webhook_event) rather than extending api_requests / billing_*.

Why: Resemble's units (audio_seconds, images_processed, searches, voice_subscriptions) don't map onto the token-centric api_requests schema. Cleaner to keep provider-specific ledgers in their own tables and roll them into the global totals at reporting time.

REVISIT when: We add a second non-token provider (ElevenLabs, AssemblyAI, etc.) — at that point we may want a common provider_usage table.

Q3 — Audio storage policy¶

Policy (revised 2026-04-27): All new media uploads are routed to Cloudflare R2 via app/utils/storage/. AWS S3 stays in place as a fallback backend (selectable via STORAGE_BACKEND=s3) and continues to serve any pre-existing S3 URLs. Per-asset retention is enforced by R2 lifecycle rules + DB provider_resemble_asset.expires_at based on which prefix the file lands under.

Per-asset retention table:

Asset class	Bucket prefix	TTL	Rationale
TTS generated audio	`tts-output/`	7 days	Regenerable — user can re-synthesize
STT input audio	`stt-input/`	30 days	Non-regenerable user upload
STT transcript	`stt-output/`	30 days	Tiny text files; matches input window
Audio enhance/edit results	`audio-jobs/`	7 days	Regenerable derivative
Watermark results	`watermark/`	7 days	Regenerable derivative
Voice-clone source recordings	`voice-recordings/`	PERMANENT	Identity asset; user investment
Built voice models metadata	`voice-models/`	PERMANENT	Identity asset
Voice design candidates	`voice-design/`	14 days	Long enough to evaluate + decide
Image generation output	`images/`	30 days	Small files; moderate window
Bedrock video output	`videos/`	14 days	Large files = real cost
Chat-uploaded images	`chat-images/`	90 days	Tied to conversation lifetime
Generic uploads	`uploads/`	30 days	Default when `purpose` unspecified

Key naming pattern: {prefix}{user_id}/{YYYY}/{MM}/{DD}/{HHMMSS}-{request_id}.{ext} - Per-user GDPR delete: aws s3 rm bucket/{prefix}{user_id}/ --recursive - Date-partitioned for lifecycle rules + reporting - Collision-proof and self-documenting

Three retention principles: 1. Regenerable output → short TTL (7d). 2. User-supplied source → medium TTL (30d). 3. Identity assets → PERMANENT.

Why R2 over S3: - Zero egress fees (vs. S3's ~$0.09/GB) — voice/audio playback hits the bucket from end-user browsers, so egress dominates the bill at scale. - S3-compatible API — same boto3 client with a different endpoint/region.

Why keep S3 around: - Legacy assets already there; URLs continue to resolve. - Rollback path: STORAGE_BACKEND=s3 re-routes new uploads instantly with no code change.

Implementation status (2026-04-27): Live in code on feat/resemble-phase-6-agents. R2-mirror tested end-to-end against local server for TTS synthesize, uploads (purpose=voice_clone, purpose=stt_input, default), audio enhance/edit GET, and watermark apply GET. Lifecycle rules need to be added in the Cloudflare dashboard per the table above (one rule per prefix, no rule on voice-recordings/ or voice-models/).

Soft-delete window (planned): Lifecycle moves expired objects to _deleted/<original-prefix>/ for 7 days before hard delete, so support can restore on user request. Not yet wired.

REVISIT when: R2 storage cost crosses 1% of Resemble revenue (rough break-even line) — if it ever does, look at tiered cold storage before shortening any TTL.

Q4 — API shape (OpenAI-compat vs Resemble-native vs both)¶

Default implemented: Resemble-native routes under /resemble/*. OpenAI-compat TTS (/audio/tts/generations) is NOT auto-wired for Resemble yet — Resemble's voice selection (UUID) doesn't map onto OpenAI's voice-name enum, and the per-character cost model doesn't match.

Why: Shipping native first gets a working feature out fast; adding the compatible shape later is additive and can live behind a thin adapter.

REVISIT when: A user explicitly needs the OpenAI shape for a drop-in client migration — add the adapter then.

Q5 — Key management (single account key vs BYOK)¶

Default implemented: Single IndoxHub account key via settings.RESEMBLE_API_KEY. BYOK is NOT supported on Resemble routes yet.

Why: BYOK adds schema + UI work (storing user-provided keys encrypted) and defeats the rate-limit distribution work in #10. A single key is also how all other non-Bedrock providers work here today.

REVISIT when: A paying customer asks for BYOK, or when Resemble's per-key concurrency caps become the bottleneck despite the rate-limit fairness layer.

Q6 — Business-plan upgrade decision¶

Default implemented: Not upgraded. Voice cloning (/voices, /voices/{uuid}/build) and voice design (/voice-design/*) remain live in code but will 403 on upstream until Resemble Business is purchased. A documented admin toggle would gate the routes instead — see business-plan.md in this folder.

REVISIT when: We see ≥ N users explicitly asking for cloning or when the marketing case (named-voice features) justifies the monthly cost. Owner: product lead.

Q7 — SLA / failover behavior¶

Default implemented: NO failover. On Resemble 5xx / timeout the route returns 502 Bad Gateway ("Resemble upstream error"). No retry policy beyond what ResembleClient already does. No second provider wired in.

Why: Failover to ElevenLabs / Cartesia is a whole provider package's worth of work. Not shippable in this phase.

REVISIT when: Uptime SLO is formally defined and we can't hit it with Resemble alone. Candidate: introduce a circuit breaker (pybreaker) first, then a failover provider if the breaker trips often.

Q8 — Rate-limit fairness across users¶

Default implemented: Redis sliding-window counter per (user_id, capability). Caps encoded in app/utils/resemble_rate_limit.py::_LIMITS. Failing the check surfaces a 429 with Retry-After. If Redis is off, checks degrade open.

Why: Simple, matches existing rate-limit infra, prevents one user from filling all concurrency slots on our shared key.

REVISIT when: We add tiered rate limits (free / paid) — extend the lookup to be a function of user_tier, not just capability.

Also implemented as defaults (not on the original list)¶

Markup currency: USD only (provider_resemble_usage.currency defaults to USD). Multi-currency will require FX lookup at write time.
Reconciliation window: 24 h rolling. Drift threshold: ±1 %. Alerts write to logger.error only — no paging integration yet.
Webhook signature scheme: HMAC-SHA256 over the raw request body, header X-Resemble-Signature, secret RESEMBLE_WEBHOOK_SECRET. If the secret is unset we accept unsigned payloads (dev convenience); set it in prod.
Asset expiry: Not written by default — provider_resemble_asset.expires_at is nullable. A cleanup Celery task can be added later.