Large Scale Inference

SFC Large Scale Inference provides large-scale asynchronous batch processing for multi-modal AI models. This API is designed for processing trillions of tokens efficiently at a low price while maintaining high accuracy.

Key Concepts

Batch Processing

Batch: A collection of inference requests packaged in a .tar file. The .tar file must be less than 5GB in size.
Completion Window: The maximum time allowed for batch processing. User configurable up to the S3 pre-signed URL expiry. We recommend setting longer completion windows.

File Handling

To protect your privacy, SFC never stores your files: all data is read from and written to your choice of S3-compatible storage using pre-signed URLs.
Expected File Format:
- Inputs: Pre-signed URLs from a S3 compatible object store pointing to compressed tar files containing jobs.jsonl and associated media. Individual .tar files must be under 5GB.
- Output : Pre-signed URLs from a S3 compatible object store where results will be written as compressed .tar files.

Completion Windows

When you submit batches you must provide a completion_window parameter which determines the maximum time allowed for processing. We recommend setting a 7 day completion window or longer, as batches that cannot complete within the window at the given price will expire.

Getting Started

To learn how to access our API read our API Quickstart. Our REST API endpoints are documented in our API Reference. LSI expects and produces a specific file structure for batches.

Requirements and Recommendations

Pre-signed URL duration: If your pre-signed URLs expire your associated batch jobs will fail. We recommend setting your URL expiry to 7 days to help prevent this.

Test format compliance: Run a small test job with a sample of your jobs.jsonl to validate that it follows the OpenAI API format before submitting larger workloads.

File compression: Use gzip compression for .tar files to minimize upload and download times while staying under the 5GB size limit.

Rate limits: The number of concurrent requests to the LSI API must be under 1 request per second. Concurrent batches are limited by compute availability; excess batches will expire.

Batch Size: For optimal pricing and performance we recommend submitting ~48 million tokens per batch.

Input File Format

Input files must be compressed tar files (.tar.gz or .tar) with the following directory structure:

batch-input/
├── jobs.jsonl          # Required: JSONL file with requests
└── files/              # Optional: Media files directory
    ├── image1.png
    ├── image2.jpg
    └── subfolder/
        └── image3.png

`jobs.jsonl` Format

Each line contains a complete OpenAI-compatible request:

{
  "custom_id": "request-001",
  "method": "POST",
  "url": "/v1/chat/completions",
  "body": {
    "model": "Qwen/Qwen2.5-VL-32B-Instruct",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe this image"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "file:files/image1.png"
            }
          }
        ]
      }
    ],
    "max_tokens": 1000
  }
}

Key Requirements:

File references use file: protocol with paths relative to tar root (no leading slash after file:)
All requests in a batch must use the same model
Maximum 5GB tar file size

Output File Format

Output files are compressed tar files with:

batch-output/
├── output.jsonl        # Successful responses
└── error.jsonl         # Failed requests (if any)

`output.jsonl` Format

{
  "id": "batch_req_696ec8427763459fa409788746bda3e3",
  "custom_id": "request-001",
  "response": {
    "status_code": 200,
    "request_id": "request-001",
    "body": {
      "choices": [
        {
          "finish_reason": "stop",
          "index": 0,
          "message": {
            "content": "This image shows...",
            "role": "assistant"
          }
        }
      ],
      "created": 1751329764,
      "id": "a0eeea75457242c1b7ab5e07138e470c",
      "model": "Qwen/Qwen2.5-VL-32B-Instruct",
      "object": "chat.completion",
      "usage": {
        "completion_tokens": 100,
        "prompt_tokens": 296,
        "total_tokens": 396
      }
    }
  },
  "error": null
}

Processing Video

File Structure for Video Batch

batch-input/
├── jobs.jsonl
└── files/
    └── sample_video.mp4

LSI can accept videos as input. Simply replace your image_url field with video_url inside your jobs.jsonl. Each video should be placed in the files/ folder inside your .tar.gz input archive.

Sample jobs.jsonl Entry for Video

{
  "custom_id": "request-video-001",
  "method": "POST",
  "url": "/v1/chat/completions",
  "body": {
    "model": "Qwen/Qwen2.5-VL-32B-Instruct",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is happening in this video?"
          },
          {
            "type": "video_url",
            "video_url": {
              "url": "file:files/sample_video.mp4"
            }
          }
        ]
      }
    ],
    "max_tokens": 1000
  }
}

Other Use Cases and Models

LSI is designed for large scale, mostly enterprise, use cases. That lets us be more hands on than traditional, self-serve providers. If you are interested in features not documented here, please contact us. Between Modular's world-class engineering & SFC's dramatic price optimization, we'll work with you to get the best possible price & performance.