GPU Compute / Recipe

Serve an open-source model

Launch a persistent OpenAI-compatible endpoint for any Hugging Face model. Badgr provisions the GPU, starts vLLM, and returns a URL you can use with any OpenAI SDK client.

Prerequisites: npm install -g badgr-cli then badgr login. Setup guide →

1. Start the endpoint

badgr serve meta-llama/Llama-3.1-8B-Instruct

GPU is selected automatically based on model size — no flags needed. Badgr starts vLLM and prints the endpoint URL when ready (usually 2–5 minutes). The deployment stays running until you run badgr down.

2. Call the endpoint

Point any OpenAI SDK client at the URL printed by badgr serve:

Python

from openai import OpenAI

client = OpenAI(
    api_key="your-badgr-api-key",
    base_url="https://dep-a1b2c3.api.badgr.ai/v1",  # printed by badgr serve
)

resp = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.BADGR_API_KEY,
  baseURL: "https://dep-a1b2c3.api.badgr.ai/v1",
});

const resp = await client.chat.completions.create({
  model: "meta-llama/Llama-3.1-8B-Instruct",
  messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.choices[0].message.content);

curl

curl https://dep-a1b2c3.api.badgr.ai/v1/chat/completions \
  -H "Authorization: Bearer $BADGR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

3. Stop billing

badgr down <deployment-id>

Terminates the GPU instance. Use badgr receipts to see cost after.

Options

--gpu <type>Override GPU type (RTX_4090, L40S, A6000, A100, H100). Auto-selected if omitted.
--region <region>Optional region preference. Badgr chooses best available capacity if omitted.
--max-price 2.00Hard price cap in USD/hr
--count 2Number of GPUs (1–8)
--dry-runPreview routing without provisioning