Speech-to-Text (STT)¶

Transcribe audio files to text, or translate audio to English.

Transcription¶

Endpoint: POST /api/v1/audio/stt/transcriptions
Auth: Required
Content-Type: multipart/form-data

Form Parameters¶

Field	Type	Required	Default	Description
`file`	`file`	Yes	—	Audio file to transcribe
`model`	`string`	No	`whisper-1`	STT model ID
`provider`	`string`	No	`openai`	Provider name
`language`	`string`	No	—	Language code (e.g. `en`, `es`, `fr`)
`prompt`	`string`	No	—	Guide text for the model
`response_format`	`string`	No	`json`	`json`, `text`, `srt`, `verbose_json`, `vtt`
`temperature`	`float`	No	`0.0`	Sampling temperature (0.0–1.0)
`timestamp_granularities`	`string`	No	—	JSON string: `["word", "segment"]`
`byok_api_key`	`string`	No	—	Your own provider API key

Examples¶

PythoncURL

import requests

with open("audio.mp3", "rb") as f:
    response = requests.post(
        "https://api.indoxhub.com/api/v1/audio/stt/transcriptions",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        files={"file": ("audio.mp3", f, "audio/mpeg")},
        data={
            "model": "openai/whisper-1",
            "language": "en",
            "response_format": "json"
        }
    )
print(response.json()["data"]["text"])

curl https://api.indoxhub.com/api/v1/audio/stt/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F file=@audio.mp3 \
  -F model=openai/whisper-1 \
  -F language=en \
  -F response_format=json

Transcription Response¶

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "created_at": "2026-04-07T12:00:00Z",
  "duration_ms": 3200.0,
  "provider": "openai",
  "model": "whisper-1",
  "success": true,
  "data": {
    "text": "Hello, welcome to the IndoxHub platform.",
    "language": "en",
    "duration": 4.5,
    "words": null,
    "segments": null
  },
  "usage": {
    "type": "audio",
    "seconds": 4.5
  }
}

Translation¶

Translate audio from any language to English.

Endpoint: POST /api/v1/audio/stt/translations
Auth: Required
Content-Type: multipart/form-data

Note

Translation is currently only supported with OpenAI's whisper-1 model.

Form Parameters¶

Field	Type	Required	Default	Description
`file`	`file`	Yes	—	Audio file to translate
`model`	`string`	No	`whisper-1`	Model ID
`provider`	`string`	No	`openai`	Provider name
`prompt`	`string`	No	—	Style guide for the output
`response_format`	`string`	No	`json`	`json`, `text`, `srt`, `verbose_json`, `vtt`
`temperature`	`float`	No	`0.0`	Sampling temperature
`byok_api_key`	`string`	No	—	Your own provider API key

Example¶

curl https://api.indoxhub.com/api/v1/audio/stt/translations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F file=@spanish_audio.mp3 \
  -F model=openai/whisper-1 \
  -F response_format=json