Streaming¶
Receive AI responses in real-time using Server-Sent Events (SSE).
Enabling Streaming¶
Set stream: true in your request body for chat completions or text completions.
Chat Completions Streaming¶
import requests
response = requests.post(
"https://api.indoxhub.com/api/v1/chat/completions",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Write a short poem"}],
"stream": True
},
stream=True
)
for line in response.iter_lines():
if line:
text = line.decode("utf-8")
if text.startswith("data: ") and text != "data: [DONE]":
print(text[6:], end="", flush=True)
print()
const response = await fetch("https://api.indoxhub.com/api/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "openai/gpt-4o-mini",
messages: [{ role: "user", content: "Write a short poem" }],
stream: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split("\n");
for (const line of lines) {
if (line.startsWith("data: ") && line !== "data: [DONE]") {
process.stdout.write(line.slice(6));
}
}
}
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.indoxhub.com/v1"
)
stream = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Write a short poem"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Supported providers¶
Streaming is supported for chat completions across the following nine providers:
| Provider | Example model |
|---|---|
| OpenAI | openai/gpt-4o-mini |
| Anthropic | anthropic/claude-haiku-4-5 |
google/gemini-2.5-flash |
|
| AWS Bedrock | bedrock/us.amazon.nova-micro-v1:0 |
| DeepSeek | deepseek/deepseek-chat |
| xAI | xai/grok-4-fast |
| Mistral | mistral/mistral-small-latest |
| Qwen | qwen/qwen-turbo |
| HuggingFace | huggingface/meta-llama/Llama-3.1-8B-Instruct |
SSE wire format¶
The response uses the W3C Server-Sent Events spec. Frames are separated by a blank line. There are two frame shapes:
Named events — the event: line names a specific event type IndoxHub injects:
event: usage_start
data: {"type":"usage_start","request_id":"req-1","provider":"openai","model":"gpt-4o-mini","input_tokens":15}
Data-only frames — token deltas, finish markers, and the final [DONE] terminator:
data: {"type":"content","data":"Hello","provider":"openai","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]}
data: {"type":"finish","provider":"openai","finish_reason":"stop","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}
data: [DONE]
A typical stream emits this sequence:
event: usage_start— once, immediately after the upstream provider acknowledges the request. Carriesinput_tokensso clients can show "thinking" indicators with accurate context cost.data: {"type":"content",…}— many, one per token delta.data: {"type":"finish",…}— once, with the upstreamfinish_reason(stop,length,content_filter, etc.).event: usage_final— once, with totals:input_tokens,output_tokens,cost_usd,latency_ms. No second API call needed to get billing data.data: {"type":"response.done",…}— OpenAI Responses-API envelope for clients that mirror that shape.data: [DONE]— terminator.
For the full per-event schema and a parser that captures cost_usd / latency_ms per stream, see SSE Events.
Stopping a Stream¶
There are two ways to stop an in-flight stream. Both halt upstream token generation, and you are only billed for tokens produced up to the cancel point.
1. Client disconnect (recommended)¶
Closing the HTTP connection — e.g. AbortController.abort() in the browser, closing the requests response, or killing the process — now propagates to the upstream model within ~1 second. No extra API call needed.
const controller = new AbortController();
const response = await fetch("https://api.indoxhub.com/api/v1/chat/completions", {
method: "POST",
headers: { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" },
body: JSON.stringify({
model: "openai/gpt-4o-mini",
messages: [{ role: "user", content: "Write a long story" }],
stream: true
}),
signal: controller.signal
});
// Abort after 2 seconds — billing stops at the abort point
setTimeout(() => controller.abort(), 2000);
2. Explicit stop endpoint¶
Useful when you want to stop a stream from a different connection than the one receiving it (e.g. a "Stop" button on another device).
Chat completions: POST /api/v1/chat/stop-stream/{stream_id}
Text completions: POST /api/v1/completions/stop-stream/{stream_id}
curl -X POST https://api.indoxhub.com/api/v1/chat/stop-stream/my-stream-id \
-H "Authorization: Bearer YOUR_API_KEY"
The stream_id is the value returned in the X-Request-ID response header of the original streaming request.
Timeout¶
Streams have a default timeout of 30 seconds. If no data is received within this period, the connection is closed.