A deployment maintains N nodes on a capacity. A
procurement buys and sells spot compute to keep them running. To scale,
update the deployment’s target node count. The procurement adjusts automatically.
Prerequisites
- SF Compute CLI installed and authenticated (
sf login)
- Credits on your account (
sf billing balance)
- A zone with availability (
sf zones ls)
Create a capacity
sf capacities create --zone richmond --name inference
Create a node template
Define the image and startup script for your inference nodes.
#!/bin/bash
mkdir -p /root/.ssh
cat >>/root/.ssh/authorized_keys <<"EOF"
ssh-ed25519 AAAA... you@example.com
EOF
# Start your inference server here
sf node-templates create \
--name inference-worker \
--image ubuntu-22.04.5-cuda-12.7 \
--cloud-init ./startup.sh
See Node Templates for details on cloud-init and image configuration.
Create a deployment
sf deployments create \
--name inference \
--capacity inference \
--node-template inference-worker \
--target-node-count 4
4 nodes are created in awaiting_allocation. They start once the capacity has compute time.
Create a procurement
node_count as the target tells the procurement to match however many nodes exist on the capacity.
sf procurements create \
--name inference \
--capacity inference \
--target node_count \
--max-buy-price 20.00 \
--min-sell-price 10.00 \
--window 2h
The procurement sees 4 waiting nodes and places buy orders. Within minutes, your nodes move to
running.
Scale up
sf deployments set inference --target-node-count 8
4 new nodes are created. The procurement buys compute to cover them.
Scale down
sf deployments set inference --target-node-count 2
Excess nodes are removed. The procurement sells unneeded compute.
Handling interruptions
Spot compute is not guaranteed. Nodes may shut down if the market price exceeds your buy limit or
other buyers place reservations that consume the capacity. Design workloads to handle nodes being
replaced.
- Stateless workers. Download model weights on boot. Local disk does not persist between nodes.
- Health check your load balancer. Route traffic only to nodes that are ready.
- Longer
--window. A higher value (e.g., 6h) reduces gaps but commits more spend. See
Tuning the managed window.
Monitoring
sf deployments get inference # Deployment status and node count
sf procurements get inference # Procurement status and pricing
sf nodes ls # Individual node status
Next steps
- Adjust
--max-buy-price and --min-sell-price to control spend
- Buy a reserved block of compute into this capacity when you need to guarantee availability:
sf orders create --capacity inference --side buy --nodes 4 --start "in 5h" --duration 24h --max-rate 20.00