GPU Compute / Recipe
Serve an open-source model
Launch a persistent OpenAI-compatible endpoint for any Hugging Face model. Badgr provisions the GPU, starts vLLM, and returns a URL you can use with any OpenAI SDK client.
npm install -g badgr-cli then badgr login. Setup guide →1. Start the endpoint
badgr serve meta-llama/Llama-3.1-8B-Instruct
GPU is selected automatically based on model size — no flags needed. Badgr starts vLLM and prints the endpoint URL when ready (usually 2–5 minutes). The deployment stays running until you run badgr down.
2. Call the endpoint
Point any OpenAI SDK client at the URL printed by badgr serve:
Python
from openai import OpenAI
client = OpenAI(
api_key="your-badgr-api-key",
base_url="https://dep-a1b2c3.api.badgr.ai/v1", # printed by badgr serve
)
resp = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)Node.js
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.BADGR_API_KEY,
baseURL: "https://dep-a1b2c3.api.badgr.ai/v1",
});
const resp = await client.chat.completions.create({
model: "meta-llama/Llama-3.1-8B-Instruct",
messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.choices[0].message.content);curl
curl https://dep-a1b2c3.api.badgr.ai/v1/chat/completions \
-H "Authorization: Bearer $BADGR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [{"role": "user", "content": "Hello"}]
}'3. Stop billing
badgr down <deployment-id>
Terminates the GPU instance. Use badgr receipts to see cost after.
Options
--gpu <type>Override GPU type (RTX_4090, L40S, A6000, A100, H100). Auto-selected if omitted.--region <region>Optional region preference. Badgr chooses best available capacity if omitted.--max-price 2.00Hard price cap in USD/hr--count 2Number of GPUs (1–8)--dry-runPreview routing without provisioning