GPU Compute / Recipe

Transcription API

Launch a persistent Whisper endpoint for audio-to-text workloads using badgr serve --image. The container stays running so every request gets a warm model with no cold-start.

Prerequisites: npm install -g badgr-cli then badgr login. Setup guide →

1. Start the endpoint

Serve a Whisper API container. Pass model size and any other config via --env:

badgr serve \
  --image ghcr.io/my-org/whisper-api:latest \
  --gpu RTX_4090 \
  --env MODEL=large-v3 \
  --env LANGUAGE=en

The RTX 4090 fits Whisper large-v3 at 24 GB VRAM and is the most cost-effective option. Badgr prints the endpoint URL once the model is loaded (usually under 2 minutes).

2. Transcribe audio

If your container follows the OpenAI audio transcriptions API shape, any OpenAI client works:

Python

from openai import OpenAI

client = OpenAI(
    api_key="your-badgr-api-key",
    base_url="https://dep-a1b2c3.api.badgr.ai/v1",  # printed by badgr serve
)

with open("audio.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=f,
        response_format="text",
    )
print(transcript)

Node.js

import OpenAI from "openai";
import fs from "fs";

const client = new OpenAI({
  apiKey: process.env.BADGR_API_KEY,
  baseURL: "https://dep-a1b2c3.api.badgr.ai/v1",
});

const transcript = await client.audio.transcriptions.create({
  model: "whisper-1",
  file: fs.createReadStream("audio.mp3"),
  response_format: "text",
});
console.log(transcript);

curl

curl https://dep-a1b2c3.api.badgr.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $BADGR_API_KEY" \
  -F model=whisper-1 \
  -F file=@audio.mp3 \
  -F response_format=text

3. Stop billing

badgr down <deployment-id>

Terminates the GPU instance. Use badgr receipts to see cost after.

Options

--image <ref>Your Docker image with faster-whisper or whisper.cpp

--gpu <type>RTX_4090 is the cost-effective choice for Whisper large-v3

--env KEY=VALUESet environment variables (MODEL, LANGUAGE, etc.) — repeatable

--count <n>Multiple instances for higher concurrency

--max-price 2.00Hard price cap in USD/hr

--max-cost 5.00Auto-stop when total spend reaches this amount

--no-waitReturn immediately; use badgr logs to confirm startup

Next steps