Skip to content

Streaming

Receive AI responses in real-time using Server-Sent Events (SSE).

Enabling Streaming

Set stream: true in your request body for chat completions or text completions.

Chat Completions Streaming

import requests

response = requests.post(
    "https://api.indoxhub.com/api/v1/chat/completions",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "model": "openai/gpt-4o-mini",
        "messages": [{"role": "user", "content": "Write a short poem"}],
        "stream": True
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        text = line.decode("utf-8")
        if text.startswith("data: ") and text != "data: [DONE]":
            print(text[6:], end="", flush=True)
print()
const response = await fetch("https://api.indoxhub.com/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    model: "openai/gpt-4o-mini",
    messages: [{ role: "user", content: "Write a short poem" }],
    stream: true
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split("\n");

  for (const line of lines) {
    if (line.startsWith("data: ") && line !== "data: [DONE]") {
      process.stdout.write(line.slice(6));
    }
  }
}
curl -N https://api.indoxhub.com/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Write a short poem"}], "stream": true}'
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.indoxhub.com/v1"
)

stream = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Write a short poem"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Supported providers

Streaming is supported for chat completions across the following nine providers:

Provider Example model
OpenAI openai/gpt-4o-mini
Anthropic anthropic/claude-haiku-4-5
Google google/gemini-2.5-flash
AWS Bedrock bedrock/us.amazon.nova-micro-v1:0
DeepSeek deepseek/deepseek-chat
xAI xai/grok-4-fast
Mistral mistral/mistral-small-latest
Qwen qwen/qwen-turbo
HuggingFace huggingface/meta-llama/Llama-3.1-8B-Instruct

SSE wire format

The response uses the W3C Server-Sent Events spec. Frames are separated by a blank line. There are two frame shapes:

Named events — the event: line names a specific event type IndoxHub injects:

event: usage_start
data: {"type":"usage_start","request_id":"req-1","provider":"openai","model":"gpt-4o-mini","input_tokens":15}

Data-only frames — token deltas, finish markers, and the final [DONE] terminator:

data: {"type":"content","data":"Hello","provider":"openai","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]}

data: {"type":"finish","provider":"openai","finish_reason":"stop","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}

data: [DONE]

A typical stream emits this sequence:

  1. event: usage_start — once, immediately after the upstream provider acknowledges the request. Carries input_tokens so clients can show "thinking" indicators with accurate context cost.
  2. data: {"type":"content",…} — many, one per token delta.
  3. data: {"type":"finish",…} — once, with the upstream finish_reason (stop, length, content_filter, etc.).
  4. event: usage_final — once, with totals: input_tokens, output_tokens, cost_usd, latency_ms. No second API call needed to get billing data.
  5. data: {"type":"response.done",…} — OpenAI Responses-API envelope for clients that mirror that shape.
  6. data: [DONE] — terminator.

For the full per-event schema and a parser that captures cost_usd / latency_ms per stream, see SSE Events.

Stopping a Stream

There are two ways to stop an in-flight stream. Both halt upstream token generation, and you are only billed for tokens produced up to the cancel point.

Closing the HTTP connection — e.g. AbortController.abort() in the browser, closing the requests response, or killing the process — now propagates to the upstream model within ~1 second. No extra API call needed.

const controller = new AbortController();

const response = await fetch("https://api.indoxhub.com/api/v1/chat/completions", {
  method: "POST",
  headers: { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "openai/gpt-4o-mini",
    messages: [{ role: "user", content: "Write a long story" }],
    stream: true
  }),
  signal: controller.signal
});

// Abort after 2 seconds — billing stops at the abort point
setTimeout(() => controller.abort(), 2000);

2. Explicit stop endpoint

Useful when you want to stop a stream from a different connection than the one receiving it (e.g. a "Stop" button on another device).

Chat completions: POST /api/v1/chat/stop-stream/{stream_id} Text completions: POST /api/v1/completions/stop-stream/{stream_id}

curl -X POST https://api.indoxhub.com/api/v1/chat/stop-stream/my-stream-id \
  -H "Authorization: Bearer YOUR_API_KEY"

The stream_id is the value returned in the X-Request-ID response header of the original streaming request.

Timeout

Streams have a default timeout of 30 seconds. If no data is received within this period, the connection is closed.

Documentation last built on May 23, 2026