Large Scale Inference
SFC Large Scale Inference provides large-scale asynchronous batch processing for multi-modal AI models. This API is designed for processing trillions of tokens efficiently at a low price while maintaining high accuracy.
Key Concepts
Batch Processing
-
Batch: A collection of inference requests packaged in a
.tarfile. The.tarfile must be less than 5GB in size. -
Completion Window: The maximum time allowed for batch processing. User configurable up to the S3 pre-signed URL expiry. We recommend setting longer completion windows.
File Handling
- To protect your privacy, SFC never stores your files: all data is read from and written to your choice of S3-compatible storage using pre-signed URLs.
- Expected File Format:
Completion Windows
When you submit batches you must provide a completion_window parameter which determines the maximum time allowed for processing. We recommend setting a 7 day completion window or longer, as batches that cannot complete within the window at the given price will expire.
Getting Started
To learn how to access our API read our API Quickstart. Our REST API endpoints are documented in our API Reference. LSI expects and produces a specific file structure for batches.
Requirements and Recommendations
Pre-signed URL duration: If your pre-signed URLs expire your associated batch jobs will fail. We recommend setting your URL expiry to 7 days to help prevent this.
Test format compliance: Run a small test job with a sample of your jobs.jsonl to validate that it follows the OpenAI API format before submitting larger workloads.
File compression: Use gzip compression for .tar files to minimize upload and download times while staying under the 5GB size limit.
Rate limits: The number of concurrent requests to the LSI API must be under 1 request per second. Concurrent batches are limited by compute availability; excess batches will expire.
Batch Size: For optimal pricing and performance we recommend submitting ~48 million tokens per batch.
Input File Format
Input files must be compressed tar files (.tar.gz or .tar) with the following directory structure:
batch-input/
├── jobs.jsonl # Required: JSONL file with requests
└── files/ # Optional: Media files directory
├── image1.png
├── image2.jpg
└── subfolder/
└── image3.png
jobs.jsonl Format
Each line contains a complete OpenAI-compatible request:
{
"custom_id": "request-001",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "Qwen/Qwen2.5-VL-32B-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image"
},
{
"type": "image_url",
"image_url": {
"url": "file:files/image1.png"
}
}
]
}
],
"max_tokens": 1000
}
}
Key Requirements:
- File references use file: protocol with paths relative to tar root (no leading slash after file:)
- All requests in a batch must use the same model
- Maximum 5GB tar file size
Output File Format
Output files are compressed tar files with:
batch-output/
├── output.jsonl # Successful responses
└── error.jsonl # Failed requests (if any)
output.jsonl Format
{
"id": "batch_req_696ec8427763459fa409788746bda3e3",
"custom_id": "request-001",
"response": {
"status_code": 200,
"request_id": "request-001",
"body": {
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "This image shows...",
"role": "assistant"
}
}
],
"created": 1751329764,
"id": "a0eeea75457242c1b7ab5e07138e470c",
"model": "Qwen/Qwen2.5-VL-32B-Instruct",
"object": "chat.completion",
"usage": {
"completion_tokens": 100,
"prompt_tokens": 296,
"total_tokens": 396
}
}
},
"error": null
}
Processing Video
File Structure for Video Batch
batch-input/
├── jobs.jsonl
└── files/
└── sample_video.mp4
LSI can accept videos as input. Simply replace your image_url field with video_url inside your jobs.jsonl. Each video should be placed in the files/ folder inside your .tar.gz input archive.
Sample jobs.jsonl Entry for Video
{
"custom_id": "request-video-001",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "Qwen/Qwen2.5-VL-32B-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is happening in this video?"
},
{
"type": "video_url",
"video_url": {
"url": "file:files/sample_video.mp4"
}
}
]
}
],
"max_tokens": 1000
}
}
Other Use Cases and Models
LSI is designed for large scale, mostly enterprise, use cases. That lets us be more hands on than traditional, self-serve providers. If you are interested in features not documented here, please contact us. Between Modular's world-class engineering & SFC's dramatic price optimization, we'll work with you to get the best possible price & performance.