Skip to content

Document Processing

Use IndoxHub for document summarization, data extraction, translation, and classification.

Document Summarization

import requests

API_KEY = "YOUR_API_KEY"

def summarize(text, max_length="2 paragraphs"):
    response = requests.post(
        "https://api.indoxhub.com/api/v1/chat/completions",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "model": "anthropic/claude-haiku-4.5",
            "messages": [
                {
                    "role": "system",
                    "content": f"Summarize the text in {max_length}. "
                               "Be concise and preserve key points."
                },
                {"role": "user", "content": text}
            ],
            "temperature": 0.3
        }
    )
    return response.json()["data"]

Data Extraction

Extract structured data from unstructured text:

import json

def extract_entities(text):
    response = requests.post(
        "https://api.indoxhub.com/api/v1/chat/completions",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "model": "openai/gpt-4o-mini",
            "messages": [
                {
                    "role": "system",
                    "content": "Extract entities as JSON: {names: [], dates: [], amounts: [], locations: []}"
                },
                {"role": "user", "content": text}
            ],
            "temperature": 0.1
        }
    )
    return json.loads(response.json()["data"])

Document Classification

def classify(text, categories):
    cats = ", ".join(categories)
    response = requests.post(
        "https://api.indoxhub.com/api/v1/chat/completions",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "model": "openai/gpt-4o-mini",
            "messages": [
                {
                    "role": "system",
                    "content": f"Classify the text into one of: {cats}. "
                               "Respond with only the category name."
                },
                {"role": "user", "content": text}
            ],
            "temperature": 0.0
        }
    )
    return response.json()["data"].strip()

result = classify(
    "The Q3 earnings exceeded expectations with 15% YoY growth",
    ["finance", "technology", "healthcare", "sports"]
)
# Returns: "finance"

Audio Transcription Pipeline

Transcribe audio files and then process the text:

def transcribe_and_summarize(audio_path):
    # Step 1: Transcribe
    with open(audio_path, "rb") as f:
        resp = requests.post(
            "https://api.indoxhub.com/api/v1/audio/stt/transcriptions",
            headers={"Authorization": f"Bearer {API_KEY}"},
            files={"file": f},
            data={"model": "openai/whisper-1"}
        )
    transcript = resp.json()["data"]["text"]

    # Step 2: Summarize
    summary = summarize(transcript, max_length="3 bullet points")
    return {"transcript": transcript, "summary": summary}

Tips

  • Low temperature (0.0–0.3) for extraction and classification tasks
  • JSON mode — Ask the model to output JSON for structured extraction
  • Batch processing — Process multiple documents sequentially with error handling
  • Choose cost-effective modelsopenai/gpt-4o-mini or deepseek/deepseek-chat for high-volume processing
Documentation last built on May 23, 2026