Skip to content

Using Batch Mode for OpenAI: Use Cases, Implementation, and Pricing

Overview

OpenAI Batch mode is designed for workloads where volume and cost efficiency matter more than low latency. It enables asynchronous execution of a large number of independent requests, processed within a defined completion window. This makes it suitable for offline, backfill, or scheduled workloads that would otherwise be expensive or operationally complex using synchronous APIs.

This document focuses on practical use cases, explains why batch support is required, and provides implementation and pricing examples.

Scope and Applicability

Aspect Description
Scope Large-scale, asynchronous AI workloads
Applicability General-purpose, multi-platform
Latency Sensitivity Low
Request Dependency None (independent requests only)
Cost Optimization High priority

Detail | SAD | Use Case Breakdown

Use Case 1: Offline Document Classification

Problem Introduction:

Organizations often need to classify or tag millions of documents (PDFs, articles, logs) for search, analytics, or compliance. Running this synchronously creates high API cost and operational bottlenecks.

Why Batch Mode Is Needed:

  • Documents are independent

  • No real-time requirement

  • High request volume

  • Cost sensitivity is critical

Batch mode reduces per-token cost and avoids managing concurrency manually.

Use Case 2: Data Labeling for Machine Learning Pipelines

Problem Introduction:

ML teams frequently generate labeled datasets using LLMs (sentiment, intent, topic). This is often done periodically or as a backfill.

Why Batch Mode Is Needed:

  • Large datasets (100k–millions of samples)

  • Scheduled or offline execution

  • Deterministic inputs

Batch processing integrates cleanly with data pipelines (Airflow, Dagster).

Use Case 3: Content Backfill and Migration

Problem Introduction:

When migrating platforms, teams may need to rewrite, summarize, or translate existing content at scale.

Why Batch Mode Is Needed:

  • One-time or infrequent execution

  • Large historical datasets

  • No user-facing latency constraints

Batch mode avoids throttling and lowers migration costs.

Use Case 4: Evaluation and Benchmarking

Problem Introduction:

Engineering teams often evaluate prompts, models, or datasets by running thousands of test cases and scoring outputs.

Why Batch Mode Is Needed:

  • High-volume test matrices

  • Repeatable execution

  • Cost-efficient experimentation

Batch mode enables systematic evaluation without real-time overhead.

Use Case 5: Log and Event Enrichment

Problem Introduction:

Security and observability systems may enrich logs with summaries, root-cause hints, or semantic tags.

Why Batch Mode Is Needed:

  • Logs arrive in bulk (hourly/daily)
  • Processing is delayed by design
  • Cost scales with log volume

Batch mode aligns naturally with log aggregation windows.

Detail | SAD | Implementation Example

Batch Processing Algorithm

Step 1: Serialize all requests into a JSONL file

Step 2: Upload the JSONL file to OpenAI file storage

Step 3: Create a batch job referencing the file

Step 4: Poll batch job status

Step 5: Download output and error files

Step 6: Post-process results

Termination occurs when the batch job reaches completed, failed, or expiration.

Python Implementation (httpx)

import httpx
import json
import time

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.openai.com/v1"

headers = {
    "Authorization": f"Bearer {API_KEY}"
}

# Step 1: Create JSONL batch file
requests = [
    {
        "custom_id": "req-1",
        "method": "POST",
        "url": "/v1/responses",
        "body": {
            "model": "gpt-4.1-mini",
            "input": "Classify the sentiment of this text: I love this product."
        }
    }
]

with open("batch.jsonl", "w") as f:
    for r in requests:
        f.write(json.dumps(r) + "\n")

# Step 2: Upload file
with httpx.Client() as client:
    with open("batch.jsonl", "rb") as f:
        upload = client.post(
            f"{BASE_URL}/files",
            headers=headers,
            files={"file": f},
            data={"purpose": "batch"}
        )
    file_id = upload.json()["id"]

# Step 3: Create batch job
with httpx.Client() as client:
    batch = client.post(
        f"{BASE_URL}/batches",
        headers=headers,
        json={
            "input_file_id": file_id,
            "endpoint": "/v1/responses",
            "completion_window": "24h"
        }
    )
    batch_id = batch.json()["id"]

# Step 4: Poll status
while True:
    with httpx.Client() as client:
        status = client.get(f"{BASE_URL}/batches/{batch_id}", headers=headers).json()
    if status["status"] in ("completed", "failed"):
        break
    time.sleep(60)

Pricing Model Example

Pricing Characteristics

Aspect Description
Billing Unit Input and output tokens
Pricing Mode Discounted batch pricing
Discount Level Typically ~50% vs on-demand
Billing Scope Successfully processed requests only

Example Cost Comparison

Batch processing is priced at a discounted rate compared to standard on-demand requests.

Mode Input Cost Output Cost Total Cost
On-demand Standard rate Standard rate Higher
Batch mode ~50% discounted ~50% discounted ~50% lower

Batch pricing directly reduces overall token costs, but it is intended for workloads that can tolerate delayed execution and do not require immediate responses.

Edge Cases and Limitations

Note

Batch jobs may complete earlier than the maximum window, but early completion is not guaranteed.

Limitation Description
Latency Not suitable for interactive use
Partial failures Some requests may fail independently
Debugging Errors are available only after completion
Size limits File size and request count limits apply

Reference