Instance Types

Cluster RequirementsCopied!

San Francisco Compute lists clusters from a number of data centers that have passed our audit and signed a contract with us that provides a strong SLA, which we then pass onto you. Our audit typically (but not always) involves visiting the data center to inspect physical security, cabling, and cooling.

For example, here are some of the requirements for listing a training cluster, such as the h100i instance type. If you're interested in listing, please contact us for full cluster requirements.

  • 1TB of RAM minimum

  • 2x CPUs from an approved list

  • Three types of networks

    • Primary in-band network for orchestration

    • Out of band IPMI management network isolated from the primary network

    • High bandwidth RDMA compute fabric, typically InfiniBand

  • At least 1Gbit/s per node of internet bandwidth, with redundant bond or failover uplink

  • Managed in-band ethernet switches from an approved list of vendors

  • Air or water cooling, no immersion cooling.

  • UFM access

  • Proper burn-in, with a 48 hour cluster acceptance criteria

  • At least 2TB of high IOPs, local NVMe storage per GPU compute node

Getting a test nodeCopied!

In rare cases, we can provide test nodes before you purchase from San Francisco Compute. However, the advantage of SFC is that you can buy whatever configuration you'd like for a short time period in order to run tests. We would encourage you to try this out first, with the knowledge that you're covered by an SLA.

Instance typesCopied!

Today, we only support one instance type: h100i. These are clusters with Nvidia H100s. They have 3.2tb/s InfiniBand. They're fully interconnected on a single RDMA fabric.

You can purchase one by running:

sf buy -t h100i -n 1 -s 'tomorrow at 10am' -d '1d'

Hardware failures & refundsCopied!

Expect your cluster to break. Failure rates on large scale GPU clusters are far higher than what you may be used to on web servers. Fear not! When (not if) a portion of your cluster breaks, we will attempt to provide a hotswapped node. If we can't, we'll refund the purchase.

On a normal GPU cloud, that would be it. You simply have one less node. However, SFC is a market. That means, in many cases, you can just buy another node with your refund. The price is not guaranteed to be the same as when you bought it, but we think this is a better experience than simply being out of luck.