Beamlit is a serverless cloud platform that enables AI teams to push and run any AI workload across multiple locations in a single click. This tutorial demonstrates how to deploy your first AI model on Beamlit.

Beamlit also lets you replicate a model deployment from your private Kubernetes cluster. To get started with this method, read this tutorial.

Install the Beamlit CLI

You can install the Beamlit CLI using the by running the following 2 commands successively in a terminal:

### Tap the Beamlit formula to Homebrew
brew tap beamlit/beamlit

### Install the Beamlit CLI using Homebrew
brew install beamlit

Deploy a model

Make sure you have created an account on Beamlit, and created a first workspace. Then, run the following command in a terminal to login to Beamlit. Use the Device mode to authenticate via your browser.

bl login your-workspace-slugified-name

Let’s deploy a simple model from HuggingFace (a simple sentence-transformers model). Run the following command in the terminal:

cat << 'EOF' | beamlit apply -f -
apiVersion: beamlit.com/v1alpha1
kind: Model
metadata:
  name: my-model
spec:
  deployments:
  - environment: production
    runtime:
      model: sentence-transformers/all-MiniLM-L6-v2
      type: huggingface
EOF

That’s it! 🌎 🌏 🌍 Your model is now distributed and available across the entire Beamlit global infrastructure! Global Inference Network significantly speeds up inferences by positioning workloads near your users and smartly routing requests based on your policies.

Make a first inference

Run a first inference on your model with the following command:

bl run my-model --data '{"inputs":"What is the encoding for this input?"}'
Your model is available behind a global endpoint. Read this guide on how to use it for HTTP requests.

Next steps

You are ready to run AI everywhere with the Beamlit platform! Check out the following guides which may be useful to you:

Deploy models

Complete guide for deploying AI models directly on the Global Inference Network.

Offload traffic from a private model

Complete guide for making a minimal-footprint replica of your own model on Beamlit, for high-availability.

Guide for querying models on the Global Inference Network

Complete guide for querying your deployments on the Global Inference Network.

Or check out the following hands-on examples:

Tutorial: deploy a custom model

Read our tutorial for deploying a custom fine-tuned model from HuggingFace.

Tutorial: offload requests from a self-hosted model

Read our tutorial for offloading burst traffic from a self-hosted model on your Kubernetes cluster.