The offloading metric is a customizable infrastructure metric used by the Beamlit controller to trigger model offloading when it hits a certain threshold.

Currently, Beamlit supports two ways to retrieve metrics:

Overview

Setting up the offloading metric is made via the following parameters in the ModelDeployment custom resource:

spec:
	...
	offloadingConfig:
    behavior:
      percentage: 50
    metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 50

where:

  • behavior is the percentage of requests to offload to the remote backend when the offloading metric reaches its threshold
  • metrics is the offloading metric, based on which the controller will decide whether to trigger traffic offloading

Set up metric using Prometheus

Prerequisites

  • A Prometheus server that is either in your Kubernetes cluster, or accessible in your network via URL without authentication.
  • Set up the Beamlit controller to monitor your Prometheus service by adding the following configuration in the controller chart’s values.yaml:
config:
  metricInformer:
	  type: prometheus
	  prometheus:
		  address: my-local-prom:9090

Metric overview

The Beamlit controller can use any metric that is saved in the Prometheus database, or use any computation on such metrics using PromQL.

Metrics is specified by an External metric (using MetricSpec from Kubernetes). For example:

offloadingConfig:
    behavior:
      percentage: 50
    metrics:
      - type: External
			  external:
			    metric:
			      name: my_custom_metric_in_prom # Metric name, or PromQL query 
			      selector:
				      # Any label you want to match in your metric.
				      # This only works for a single metric (not a PromQL query).			        
				      matchLabels: 
			          my_label: "value"
			      target:
				      type: Value
				      value: 10

Set up metric using Kubernetes metrics-server

Prerequisites

  • Kubernetes metrics-server must be installed on your Kubernetes cluster.
Using metrics-server is the default mode of the Beamlit controller.

Metric overview

The Beamlit controller can use any metric that’s compatible with the Kubernetes HorizontalPodAutoscaler (HPA) and accessible through metrics-server.

Metrics are defined using the Horizontal Pod Autoscaler (HPA) format for either Resource, Pod, Object, or External metrics. For example, with Resource metrics:

  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

This will trigger offloading when the average CPU usage gets higher than 50%.

Read more about the available parameters for defining metrics and targets on the Kubernetes HPA documentation.