Skip to main content
This feature is in public preview.
A deployment maintains N instances on a capacity. A procurement buys and sells spot compute to keep them running. To scale, update the deployment’s target instance count. The procurement adjusts automatically.

Prerequisites

  • SF Compute CLI installed and authenticated (sf login)
  • Credits on your account (sf billing balance)
  • A sense of which hardware you want (sf instance-skus list)

Create a capacity

sf capacities create --name inference

Create an instance template

Define the image and startup script for your inference instances.
startup.sh
#!/bin/bash

mkdir -p /root/.ssh
cat >>/root/.ssh/authorized_keys <<"EOF"
ssh-ed25519 AAAA... you@example.com
EOF

# Start your inference server here
sf instance-templates create \
  --name inference-worker \
  --image ubuntu-22.04.5-cuda-12.7 \
  --cloud-init ./startup.sh
See Instance templates for details on cloud-init and image configuration.

Create a deployment

sf deployments create \
  --name inference \
  --capacity inference \
  --instance-template inference-worker \
  --target-instance-count 4
4 instances are created in awaiting_allocation. They start once the capacity has compute time.

Create a procurement

node_count as the target tells the procurement to match however many instances exist on the capacity. Pass --instance-sku <id> to pin the procurement to specific hardware; see instance SKUs for the catalog.
sf procurements create \
  --name inference \
  --capacity inference \
  --target node_count \
  --max-buy-price 20.00 \
  --min-sell-price 10.00 \
  --window 2h \
  --instance-sku isku_4UpxzQw7A8N
The procurement sees 4 waiting instances and places buy orders. Within minutes, your instances move to running.

Scale up

sf deployments set inference --target-instance-count 8
4 new instances are created. The procurement buys compute to cover them.

Scale down

sf deployments set inference --target-instance-count 2
Excess instances are removed. The procurement sells unneeded compute.

Handling interruptions

Spot compute is not guaranteed. Instances may shut down if the market price exceeds your buy limit or other buyers place reservations that consume the capacity. Design workloads to handle instances being replaced.
  • Stateless workers. Download model weights on boot. Local disk does not persist between instances.
  • Health check your load balancer. Route traffic only to instances that are ready.
  • Longer --window. A higher value (e.g., 6h) reduces gaps but commits more spend. See Tuning the managed window.

Monitoring

sf deployments get inference   # Deployment status and instance count
sf procurements get inference  # Procurement status and pricing
sf instances list              # Individual instance status

Next steps

  • Adjust --max-buy-price and --min-sell-price to control spend
  • Buy a reserved block of compute into this capacity when you need to guarantee availability:
    sf orders create --capacity inference --side buy --nodes 4 --start "in 5h" --duration 24h --max-rate 20.00