Set up offloading metric
Offload traffic on Beamlit based on a metric.
The offloading metric is a customizable infrastructure metric used by the Beamlit controller to trigger model offloading when it hits a certain threshold.
Currently, Beamlit supports two ways to retrieve metrics:
- metrics from a self-managed Prometheus, evaluated through a Prometheus query (PromQL).
- metrics from a Kubernetes metrics-server
Overview
Setting up the offloading metric is made via the following parameters in the ModelDeployment
custom resource:
where:
behavior
is the percentage of requests to offload to the remote backend when the offloading metric reaches its thresholdmetrics
is the offloading metric, based on which the controller will decide whether to trigger traffic offloading
Set up metric using Prometheus
Prerequisites
- A Prometheus server that is either in your Kubernetes cluster, or accessible in your network via URL without authentication.
- Set up the Beamlit controller to monitor your Prometheus service by adding the following configuration in the controller chart’s values.yaml:
Metric overview
The Beamlit controller can use any metric that is saved in the Prometheus database, or use any computation on such metrics using PromQL.
Metrics
is specified by an External metric (using MetricSpec from Kubernetes). For example:
Set up metric using Kubernetes metrics-server
Prerequisites
- Kubernetes metrics-server must be installed on your Kubernetes cluster.
Metric overview
The Beamlit controller can use any metric that’s compatible with the Kubernetes HorizontalPodAutoscaler (HPA) and accessible through metrics-server.
Metrics are defined using the Horizontal Pod Autoscaler (HPA) format for either Resource, Pod, Object, or External metrics. For example, with Resource metrics:
This will trigger offloading when the average CPU usage gets higher than 50%.