SSE async pipeline live in production — 9 providers, in-band cost events
Phase 7B canary now serves real production traffic for all 9 supported providers (OpenAI, Anthropic, Google, AWS Bedrock, DeepSeek, xAI, Mistral, Qwen, HuggingFace). Every stream emits usage_start + usage_final events with input/output tokens, cost_usd, and latency_ms baked into the wire format — no second API call needed for billing. Bedrock added as the 9th provider; nginx tuned for SSE; Locust load scenario shipped; Python client `indoxhub` v0.2.2 published to PyPI.What's new since 2026-04-27
| Area | Change |
|---|---|
| Providers | Bedrock added — 9 / 9 covered. Live-validated on us.amazon.nova-micro-v1:0. |
| Route | Phase 7B canary wires all 9 providers through dispatch_stream(provider_id, …) behind SSE_ASYNC_STREAMING_ENABLED. Default-off in prod; flip the GitHub Variable + re-run PROD_setup → PROD_build_docker to activate. |
| nginx | Dedicated SSE location = blocks with proxy_buffering off, 600 s timeouts, HTTP/1.1 keep-alive. |
| Load test | tests/load/locustfile_sse.py — 50-stream scenario records TTFC, total stream duration, max inter-chunk gap. |
| Docs | Streaming page rewritten + new SSE Events reference page. 17 / 17 usage pages now icon-bearing. |
| Python client | pip install indoxhub==0.2.2 — gold-standard release with R2 mirror surface + full Resemble AI namespace. |
Live verification (today)
A single test stream against the canary on production:
event: usage_start
data: {"type":"usage_start","request_id":"…","provider":"openai","model":"gpt-4o-mini","input_tokens":15}
data: {"type":"content","data":"pong","provider":"openai","choices":[…]}
event: usage_final
data: {"type":"usage_final","input_tokens":15,"output_tokens":1,"cost_usd":2.85e-06,"latency_ms":4693}
data: [DONE]
Total live validation spend across all 9 provider wrappers: under $0.001.
Wire-format reference
The full per-event schema and a complete Python parser are documented on the new SSE Events page. The Streaming page now also carries the updated wire-format walkthrough.
Operational rollback
The canary is gated by a single repository-level GitHub Variable. Flip SSE_ASYNC_STREAMING_ENABLED=true, then re-run PROD_setup → PROD_build_docker. Rollback is the same toggle in reverse — no code change, no source redeploy.