GPU Compute / Recipe

Deploy a vLLM endpoint

badgr serve uses vLLM under the hood — the same inference engine used by most production LLM deployments. This recipe explains what happens when you run it.

What badgr serve does

  1. 1Searches available GPU capacity for the type you requested.
  2. 2Provisions an instance with the vllm/vllm-openai:latest container image.
  3. 3Sets MODEL environment variable to your model ID.
  4. 4Waits for the vLLM health check to return 200 OK.
  5. 5Prints an OpenAI-compatible endpoint URL.

Command

badgr serve mistralai/Mistral-7B-Instruct-v0.3 --gpu A100 --region EU

Bring your own vLLM image

Use badgr run with a custom image if you need a specific vLLM version or extra config:

badgr run \
  --image vllm/vllm-openai:v0.6.3 \
  --gpu A100 \
  --env MODEL=mistralai/Mistral-7B-Instruct-v0.3 \
  --env MAX_MODEL_LEN=8192

Check the endpoint

# List available models on the endpoint
curl https://dep-a1b2c3.api.badgr.ai/v1/models \
  -H "Authorization: Bearer $BADGR_API_KEY"

# Health check
curl https://dep-a1b2c3.api.badgr.ai/health

Stop billing

badgr down <deployment-id>