SSE Events¶
This page is the wire-format reference for every event the IndoxHub streaming pipeline emits. For a higher-level walkthrough of how to make a streaming call, see Streaming.
Event-frame anatomy¶
Every frame follows the W3C SSE spec. There are two flavours:
| Shape | Has event: line? |
What it carries |
|---|---|---|
| Named event | yes — e.g. event: usage_start |
IndoxHub-injected accounting / lifecycle signals |
| Data-only frame | no | Token deltas, finish markers, and the OpenAI Responses-API envelope |
Both shapes always include a data: line whose value is JSON (except the final data: [DONE] terminator, which is literal).
usage_start — once, near the start¶
Emitted as soon as the upstream provider acknowledges the request. Useful for "thinking…" UI states with the actual context cost shown.
event: usage_start
data: {"type":"usage_start","request_id":"req-1","provider":"openai","model":"gpt-4o-mini","input_tokens":15}
| Field | Type | Notes |
|---|---|---|
request_id |
string | Mirrors the X-Request-ID response header. |
provider |
string | One of: openai, anthropic, google, bedrock, deepseek, xai, mistral, qwen, huggingface. |
model |
string | The provider-side model id. |
input_tokens |
int | Prompt tokens, as reported by the provider's first SSE event. May be 0 for providers that report tokens only at end of stream. |
Content frames — many¶
Standard token deltas. No event: line; just data:.
data: {"type":"content","data":"Hello","provider":"openai","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]}
The choices shape mirrors OpenAI's chat.completion.chunk for SDK compatibility. The flatter type / data / provider fields are added by IndoxHub for provider-agnostic clients.
finish — once, before the terminal frames¶
Stream-level finish marker. Carries the upstream provider's finish_reason translated to a normalized vocabulary.
data: {"type":"finish","provider":"openai","finish_reason":"stop","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}
Possible finish_reason values: stop, length, content_filter, tool_calls, function_call, error.
usage_final — once, before [DONE]¶
The accounting event. Carries totals so clients never need a second API call to bill or log.
event: usage_final
data: {"type":"usage_final","request_id":"req-1","provider":"openai","model":"gpt-4o-mini","input_tokens":15,"output_tokens":1,"cost_usd":2.85e-06,"latency_ms":4693}
| Field | Type | Notes |
|---|---|---|
input_tokens |
int | Final prompt-token count after the provider has confirmed it. |
output_tokens |
int | Generated tokens through end of stream (does not include [DONE]). |
cost_usd |
float | Computed against the IndoxHub pricing registry. 0.0 if pricing is missing — never blocks the frame. |
latency_ms |
int | Wall-clock from request acceptance to terminal frame. |
This event fires even if the upstream stream ends without a usage chunk — the injector tracks tokens locally as a fallback.
response.done — once, after usage_final¶
OpenAI Responses-API envelope for clients that mirror that shape. Contains the same usage totals as usage_final for redundancy.
data: {"type":"response.done","response":{"id":"req-1","object":"response","status":"completed","usage":{"prompt_tokens":15,"completion_tokens":1,"total_tokens":16}}}
[DONE] — terminator¶
The W3C-spec way to signal end-of-stream over a long-lived HTTP connection.
error — fired in place of usage_final on upstream failure¶
When this fires, no usage_final arrives. Clients should treat the stream as finished after the next [DONE].
Reserved (not emitted today)¶
The injector is wired to support these events, but no production codepath emits them yet. Documented so clients can ignore them safely:
| Event | Future use |
|---|---|
rate_limit_warning |
Mid-stream warning when the user is within 10% of their per-minute limit. |
cache_hit |
Sent when a response was served from the prompt cache (not a stream cache — first-token came from a cached upstream prefix). |
provider_fallback |
Sent when the gateway transparently retried on a different provider. |
Complete parser¶
This Python example captures every event type, prints content as it arrives, and logs the usage_final totals at the end:
import json, requests
response = requests.post(
"https://api.indoxhub.com/api/v1/chat/completions",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Write one short sentence."}],
"stream": True,
},
stream=True,
)
current_event, content = None, ""
for raw in response.iter_lines(decode_unicode=True):
if not raw:
current_event = None # blank line ends a frame
continue
if raw.startswith("event: "):
current_event = raw[7:]
continue
if not raw.startswith("data: "):
continue
if raw == "data: [DONE]":
break
payload = json.loads(raw[6:])
if current_event == "usage_start":
print(f"[start] in_tokens={payload['input_tokens']}")
elif current_event == "usage_final":
print(f"[final] cost=${payload['cost_usd']:.6f} "
f"in={payload['input_tokens']} out={payload['output_tokens']} "
f"latency={payload['latency_ms']}ms")
elif payload.get("type") == "content":
chunk = payload.get("data", "")
content += chunk
print(chunk, end="", flush=True)
elif payload.get("type") == "finish":
print(f"\n[finish] reason={payload['finish_reason']}")
elif payload.get("type") == "error":
print(f"\n[error] {payload.get('data')}")
print(f"\nFinal text: {content!r}")
Sample output: