GPU Compute / Recipe
Deploy a vLLM endpoint
badgr serve uses vLLM under the hood — the same inference engine used by most production LLM deployments. This recipe explains what happens when you run it.
What badgr serve does
- 1Searches available GPU capacity for the type you requested.
- 2Provisions an instance with the vllm/vllm-openai:latest container image.
- 3Sets MODEL environment variable to your model ID.
- 4Waits for the vLLM health check to return 200 OK.
- 5Prints an OpenAI-compatible endpoint URL.
Command
badgr serve mistralai/Mistral-7B-Instruct-v0.3 --gpu A100 --region EU
Bring your own vLLM image
Use badgr run with a custom image if you need a specific vLLM version or extra config:
badgr run \ --image vllm/vllm-openai:v0.6.3 \ --gpu A100 \ --env MODEL=mistralai/Mistral-7B-Instruct-v0.3 \ --env MAX_MODEL_LEN=8192
Check the endpoint
# List available models on the endpoint curl https://dep-a1b2c3.api.badgr.ai/v1/models \ -H "Authorization: Bearer $BADGR_API_KEY" # Health check curl https://dep-a1b2c3.api.badgr.ai/health
Stop billing
badgr down <deployment-id>