GPU Compute / Recipe
Embeddings endpoint
Launch a persistent OpenAI-compatible embeddings API. Badgr provisions the GPU and starts vLLM. Use --task embed to explicitly enable embedding mode; omit it for models vLLM auto-detects.
npm install -g badgr-cli then badgr login. Setup guide →1. Start the endpoint
badgr serve BAAI/bge-large-en-v1.5 --task embed
The --task embed flag tells vLLM to expose the /v1/embeddings endpoint. Badgr prints the base URL when the server is ready (usually 1–3 minutes).
Other embedding models that work the same way: BAAI/bge-m3, intfloat/e5-mistral-7b-instruct, nomic-ai/nomic-embed-text-v1.5.
Pass a Hugging Face token if your model is gated:
badgr serve BAAI/bge-large-en-v1.5 --task embed --env HUGGING_FACE_HUB_TOKEN=$HF_TOKEN
2. Call the endpoint
Use any OpenAI embeddings client — point it at the URL printed by badgr serve:
Python
from openai import OpenAI
client = OpenAI(
api_key="your-badgr-api-key",
base_url="https://dep-a1b2c3.api.badgr.ai/v1", # printed by badgr serve
)
resp = client.embeddings.create(
model="BAAI/bge-large-en-v1.5",
input=["Badgr makes GPU compute simple", "Another sentence to embed"],
)
vectors = [d.embedding for d in resp.data]
print(f"Got {len(vectors)} embeddings of dim {len(vectors[0])}")Node.js
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.BADGR_API_KEY,
baseURL: "https://dep-a1b2c3.api.badgr.ai/v1",
});
const resp = await client.embeddings.create({
model: "BAAI/bge-large-en-v1.5",
input: ["Badgr makes GPU compute simple", "Another sentence to embed"],
});
const vectors = resp.data.map((d) => d.embedding);
console.log(`Got ${vectors.length} embeddings of dim ${vectors[0].length}`);curl
curl https://dep-a1b2c3.api.badgr.ai/v1/embeddings \
-H "Authorization: Bearer $BADGR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "BAAI/bge-large-en-v1.5",
"input": ["Badgr makes GPU compute simple"]
}'3. Stop billing
badgr down <deployment-id>
Terminates the GPU instance. Use badgr receipts to see cost after.
Options
--task embedEnable the /v1/embeddings endpoint (vLLM task override)--env KEY=VALUESet environment variables — repeatable--gpu <type>Override GPU type (RTX_4090, L40S, A6000, A100, H100)--count <n>Number of GPUs for higher throughput--max-price 2.00Hard price cap in USD/hr--max-cost 5.00Auto-stop when total spend reaches this amount--no-waitReturn immediately without waiting for the health checkNext steps