Speech-to-Text (STT)¶
Transcribe audio files to text, or translate audio to English.
Transcription¶
Endpoint: POST /api/v1/audio/stt/transcriptions
Auth: Required
Content-Type: multipart/form-data
Form Parameters¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
file |
file |
Yes | — | Audio file to transcribe |
model |
string |
No | whisper-1 |
STT model ID |
provider |
string |
No | openai |
Provider name |
language |
string |
No | — | Language code (e.g. en, es, fr) |
prompt |
string |
No | — | Guide text for the model |
response_format |
string |
No | json |
json, text, srt, verbose_json, vtt |
temperature |
float |
No | 0.0 |
Sampling temperature (0.0–1.0) |
timestamp_granularities |
string |
No | — | JSON string: ["word", "segment"] |
byok_api_key |
string |
No | — | Your own provider API key |
Examples¶
import requests
with open("audio.mp3", "rb") as f:
response = requests.post(
"https://api.indoxhub.com/api/v1/audio/stt/transcriptions",
headers={"Authorization": "Bearer YOUR_API_KEY"},
files={"file": ("audio.mp3", f, "audio/mpeg")},
data={
"model": "openai/whisper-1",
"language": "en",
"response_format": "json"
}
)
print(response.json()["data"]["text"])
Transcription Response¶
{
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"created_at": "2026-04-07T12:00:00Z",
"duration_ms": 3200.0,
"provider": "openai",
"model": "whisper-1",
"success": true,
"data": {
"text": "Hello, welcome to the IndoxHub platform.",
"language": "en",
"duration": 4.5,
"words": null,
"segments": null
},
"usage": {
"type": "audio",
"seconds": 4.5
}
}
Translation¶
Translate audio from any language to English.
Endpoint: POST /api/v1/audio/stt/translations
Auth: Required
Content-Type: multipart/form-data
Note
Translation is currently only supported with OpenAI's whisper-1 model.
Form Parameters¶
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
file |
file |
Yes | — | Audio file to translate |
model |
string |
No | whisper-1 |
Model ID |
provider |
string |
No | openai |
Provider name |
prompt |
string |
No | — | Style guide for the output |
response_format |
string |
No | json |
json, text, srt, verbose_json, vtt |
temperature |
float |
No | 0.0 |
Sampling temperature |
byok_api_key |
string |
No | — | Your own provider API key |