Skip to content

SSE Events

This page is the wire-format reference for every event the IndoxHub streaming pipeline emits. For a higher-level walkthrough of how to make a streaming call, see Streaming.

Event-frame anatomy

Every frame follows the W3C SSE spec. There are two flavours:

Shape Has event: line? What it carries
Named event yes — e.g. event: usage_start IndoxHub-injected accounting / lifecycle signals
Data-only frame no Token deltas, finish markers, and the OpenAI Responses-API envelope

Both shapes always include a data: line whose value is JSON (except the final data: [DONE] terminator, which is literal).

usage_start — once, near the start

Emitted as soon as the upstream provider acknowledges the request. Useful for "thinking…" UI states with the actual context cost shown.

event: usage_start
data: {"type":"usage_start","request_id":"req-1","provider":"openai","model":"gpt-4o-mini","input_tokens":15}
Field Type Notes
request_id string Mirrors the X-Request-ID response header.
provider string One of: openai, anthropic, google, bedrock, deepseek, xai, mistral, qwen, huggingface.
model string The provider-side model id.
input_tokens int Prompt tokens, as reported by the provider's first SSE event. May be 0 for providers that report tokens only at end of stream.

Content frames — many

Standard token deltas. No event: line; just data:.

data: {"type":"content","data":"Hello","provider":"openai","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]}

The choices shape mirrors OpenAI's chat.completion.chunk for SDK compatibility. The flatter type / data / provider fields are added by IndoxHub for provider-agnostic clients.

finish — once, before the terminal frames

Stream-level finish marker. Carries the upstream provider's finish_reason translated to a normalized vocabulary.

data: {"type":"finish","provider":"openai","finish_reason":"stop","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}

Possible finish_reason values: stop, length, content_filter, tool_calls, function_call, error.

usage_final — once, before [DONE]

The accounting event. Carries totals so clients never need a second API call to bill or log.

event: usage_final
data: {"type":"usage_final","request_id":"req-1","provider":"openai","model":"gpt-4o-mini","input_tokens":15,"output_tokens":1,"cost_usd":2.85e-06,"latency_ms":4693}
Field Type Notes
input_tokens int Final prompt-token count after the provider has confirmed it.
output_tokens int Generated tokens through end of stream (does not include [DONE]).
cost_usd float Computed against the IndoxHub pricing registry. 0.0 if pricing is missing — never blocks the frame.
latency_ms int Wall-clock from request acceptance to terminal frame.

This event fires even if the upstream stream ends without a usage chunk — the injector tracks tokens locally as a fallback.

response.done — once, after usage_final

OpenAI Responses-API envelope for clients that mirror that shape. Contains the same usage totals as usage_final for redundancy.

data: {"type":"response.done","response":{"id":"req-1","object":"response","status":"completed","usage":{"prompt_tokens":15,"completion_tokens":1,"total_tokens":16}}}

[DONE] — terminator

data: [DONE]

The W3C-spec way to signal end-of-stream over a long-lived HTTP connection.

error — fired in place of usage_final on upstream failure

data: {"type":"error","data":"Provider returned 502 Bad Gateway","provider":"openai"}

When this fires, no usage_final arrives. Clients should treat the stream as finished after the next [DONE].

Reserved (not emitted today)

The injector is wired to support these events, but no production codepath emits them yet. Documented so clients can ignore them safely:

Event Future use
rate_limit_warning Mid-stream warning when the user is within 10% of their per-minute limit.
cache_hit Sent when a response was served from the prompt cache (not a stream cache — first-token came from a cached upstream prefix).
provider_fallback Sent when the gateway transparently retried on a different provider.

Complete parser

This Python example captures every event type, prints content as it arrives, and logs the usage_final totals at the end:

import json, requests

response = requests.post(
    "https://api.indoxhub.com/api/v1/chat/completions",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "model": "openai/gpt-4o-mini",
        "messages": [{"role": "user", "content": "Write one short sentence."}],
        "stream": True,
    },
    stream=True,
)

current_event, content = None, ""
for raw in response.iter_lines(decode_unicode=True):
    if not raw:
        current_event = None  # blank line ends a frame
        continue
    if raw.startswith("event: "):
        current_event = raw[7:]
        continue
    if not raw.startswith("data: "):
        continue
    if raw == "data: [DONE]":
        break

    payload = json.loads(raw[6:])
    if current_event == "usage_start":
        print(f"[start] in_tokens={payload['input_tokens']}")
    elif current_event == "usage_final":
        print(f"[final] cost=${payload['cost_usd']:.6f} "
              f"in={payload['input_tokens']} out={payload['output_tokens']} "
              f"latency={payload['latency_ms']}ms")
    elif payload.get("type") == "content":
        chunk = payload.get("data", "")
        content += chunk
        print(chunk, end="", flush=True)
    elif payload.get("type") == "finish":
        print(f"\n[finish] reason={payload['finish_reason']}")
    elif payload.get("type") == "error":
        print(f"\n[error] {payload.get('data')}")

print(f"\nFinal text: {content!r}")

Sample output:

[start] in_tokens=15
The sun set over the horizon.
[finish] reason=stop
[final] cost=$0.000003 in=15 out=8 latency=924ms
Final text: 'The sun set over the horizon.'
Documentation last built on May 23, 2026