Vision & Multimodal¶
Send images alongside text in chat completions for visual understanding.
Endpoint: POST /api/v1/chat/completions
Auth: Required
Multimodal Messages¶
Pass images as part of the content array in a message:
import requests
response = requests.post(
"https://api.indoxhub.com/api/v1/chat/completions",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"model": "openai/gpt-4o",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/photo.jpg"
}
}
]
}
],
"max_tokens": 300
}
)
print(response.json()["data"])
const response = await fetch("https://api.indoxhub.com/api/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "openai/gpt-4o",
messages: [{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{ type: "image_url", image_url: { url: "https://example.com/photo.jpg" } }
]
}],
max_tokens: 300
})
});
const data = await response.json();
console.log(data.data);
curl https://api.indoxhub.com/api/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}],
"max_tokens": 300
}'
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.indoxhub.com/v1"
)
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}],
max_tokens=300
)
print(response.choices[0].message.content)
Base64 Images¶
Send images as base64-encoded data:
import base64
with open("photo.jpg", "rb") as f:
b64 = base64.b64encode(f.read()).decode()
response = requests.post(
"https://api.indoxhub.com/api/v1/chat/completions",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"model": "openai/gpt-4o",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{b64}"
}
}
]
}]
}
)
Supported Models¶
Vision capabilities are available on models with image in their input_modalities. Check the Models endpoint for current support.
Common vision models:
openai/gpt-4o— Best quality visionopenai/gpt-4o-mini— Fast and affordable visionanthropic/claude-opus-4-7— Strong visual reasoning (newest flagship)anthropic/claude-opus-4-6— Strong visual reasoninggoogle/gemini-2.0-flash— Fast multimodal