Query a model

Model deployments on Beamlit have an inference endpoint which can be used by external consumers to request an inference execution. Inference requests are then routed on the Global Inference Network based on the deployment policies associated with your model deployment.

Inference endpoints

Whenever you deploy a model on Beamlit, an inference endpoint is generated on Global Inference Network.

The inference URL looks like this:

run.beamlit.dev/{your-workspace}/models/{your-model}

There is one distinct endpoint for each model deployment, i.e. for each combination of a model and an environment on which it is deployed.

For example, if you have one version of model “your-model” deployed on the production environment and one version deployed on the development environment:

run.beamlit.dev/{your-workspace}/models/{your-model}?environment=production will call the production deployment
run.beamlit.dev/{your-workspace}/models/{your-model}?environment=development will call the development deployment

If you do not specify the environment in the inference request, it will call the production environment by default. If the model is not deployed on the production environment, it will return an error.

Specific API endpoints in your model

The URL above hosts your model and can be called directly in most cases. However your model may implement additional endpoints. These sub-endpoints will be hosted on this URL.

For example, if you deploy a text generation model that also implements the ChatCompletions API:

calling run.beamlit.dev/your-workspace/models/your-model (the base endpoint) will generate text based on a prompt
calling run.beamlit.dev/your-workspace/models/your-model/v1/chat/completions (the ChatCompletions API implementation) will generate response based on a list of messages

Endpoint authentication

By default, models deployed on Beamlit aren’t public. It is necessary to authenticate all inference requests, via a bearer token.

The evaluation of authentication/authorization for inference requests is managed by the Global Inference Network based on the access given in your workspace.

Making a model publicly available is not yet available. Please contact us at support@beamlit.com if this is something that you need today.

Make an inference request

Beamlit API

Make a POST request to the inference endpoint for the model deployment you are requesting, making sure to fill in the authentication token:

curl 'https://run.beamlit.dev/YOUR-WORKSPACE/models/YOUR-MODEL?environment=YOUR-ENVIRONMENT' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'x-beamlit-authorization: Bearer YOUR-TOKEN' \
  -H 'x-beamlit-workspace: YOUR-WORKSPACE' \
  --data-raw $'{"inputs":"Enter your input here."}'

Read about the API parameters in the reference.

Beamlit CLI

The following command will make a default POST request to the model’s production deployment (by default), on the base endpoint.

bl run your-model --data '{"inputs":"Enter your input here."}'

You can call another deployment with the option --env , and you can call specific API endpoints that your model implements with the option --path :

bl run your-model --env development --path /v1/chat/completions --data '{"inputs":"Hello there!"}' 

Read about the CLI parameters in the reference.

Beamlit console

Inference requests can be made from the Beamlit console from the model deployment’s workbench page.

Get Started

Examples

Agents

Functions

Model APIs

Integrations

Administration & security

Inference endpoints

Specific API endpoints in your model

Endpoint authentication

Make an inference request

Beamlit API

Beamlit CLI

Beamlit console

Get Started

Examples

Agents

Functions

Model APIs

Integrations

Administration & security

​Inference endpoints

​Specific API endpoints in your model

​Endpoint authentication

​Make an inference request

​Beamlit API

​Beamlit CLI

​Beamlit console

Inference endpoints

Specific API endpoints in your model

Endpoint authentication

Make an inference request

Beamlit API

Beamlit CLI

Beamlit console