W
Whisper API
Navigation

Complete, copy-paste-ready examples for integrating with the Whisper API. Replace YOUR_API_KEY with an actual token from python -m app.cli create.


File Upload

Transcribe a local audio file by sending raw bytes:

cURL

curl -X POST 'http://localhost:7860/v1/listen?model=tiny.en' \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @audio.wav

Python

import httpx

API_URL = "http://localhost:7860/v1/listen"
API_KEY = "YOUR_API_KEY"

with open("audio.wav", "rb") as f:
    response = httpx.post(
        API_URL,
        params={"model": "tiny.en"},
        headers={
            "Authorization": f"Token {API_KEY}",
            "Content-Type": "audio/wav",
        },
        content=f.read(),
        timeout=60.0,
    )

data = response.json()
transcript = data["results"]["channels"][0]["alternatives"][0]["transcript"]
print(f"Transcript: {transcript}")

JavaScript

const fs = require('fs');

const API_URL = 'http://localhost:7860/v1/listen';
const API_KEY = 'YOUR_API_KEY';

const audioBuffer = fs.readFileSync('audio.wav');

const response = await fetch(`${API_URL}?model=tiny.en`, {
  method: 'POST',
  headers: {
    'Authorization': `Token ${API_KEY}`,
    'Content-Type': 'audio/wav',
  },
  body: audioBuffer,
});

const data = await response.json();
const transcript = data.results.channels[0].alternatives[0].transcript;
console.log(`Transcript: ${transcript}`);

URL-Based Transcription

Transcribe an audio file hosted at a remote URL:

cURL

curl -X POST 'http://localhost:7860/v1/listen?model=tiny.en' \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/audio.mp3"}'

Python

import httpx

response = httpx.post(
    "http://localhost:7860/v1/listen",
    params={"model": "tiny.en"},
    headers={"Authorization": "Token YOUR_API_KEY"},
    json={"url": "https://example.com/audio.mp3"},
    timeout=60.0,
)

print(response.json())

JavaScript

const response = await fetch(
  'http://localhost:7860/v1/listen?model=tiny.en',
  {
    method: 'POST',
    headers: {
      'Authorization': 'Token YOUR_API_KEY',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      url: 'https://example.com/audio.mp3',
    }),
  }
);

console.log(await response.json());

Advanced Controls

Audio Cropping

Process only a specific segment of the audio — skip to 2 seconds and process 5 seconds:

curl -X POST 'http://localhost:7860/v1/listen?start=2000&duration=5000' \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @audio.wav

Custom Vocabulary Prompting

Guide the model to correctly recognize domain-specific terms:

curl -X POST 'http://localhost:7860/v1/listen?prompt=MUTTON,TURNIPS,KUBERNETES' \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @audio.wav

Subtitle Export

SRT Format

curl -X POST 'http://localhost:7860/v1/listen?response_format=srt' \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @audio.wav > subtitles.srt

VTT Format

curl -X POST 'http://localhost:7860/v1/listen?response_format=vtt' \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @audio.wav > subtitles.vtt

Live Streaming

WebSocket with Python

import asyncio
import websockets
import json

async def stream_file(file_path, token, model="tiny.en"):
    uri = f"ws://localhost:7860/v1/listen?token={token}&model={model}"

    async with websockets.connect(uri) as ws:
        # Read initial metadata
        metadata = json.loads(await ws.recv())
        print(f"Connected: {metadata['request_id']}")

        # Stream audio in chunks
        with open(file_path, "rb") as f:
            while chunk := f.read(8000):
                await ws.send(chunk)
                await asyncio.sleep(0.25)  # Simulate real-time

        # Signal end
        await ws.send(json.dumps({"type": "CloseStream"}))

        # Collect remaining results
        async for msg in ws:
            result = json.loads(msg)
            if result.get("type") == "Results":
                text = result["channel"]["alternatives"][0]["transcript"]
                if text.strip():
                    print(f"[{result['start']:.1f}s] {text}")

asyncio.run(stream_file("audio.wav", "YOUR_API_KEY"))

Using the Included Example Scripts

# File-based streaming
python examples/test_streaming.py \
  --token YOUR_API_KEY \
  --audio audio/jfk.wav \
  --model tiny.en

# Live microphone
python examples/mic_transcription.py \
  --token YOUR_API_KEY \
  --model tiny.en

# List audio devices
python examples/mic_transcription.py --list-devices

WebSocket with wscat

# Install wscat
npm install -g wscat

# Connect
wscat -c "ws://localhost:7860/v1/listen?token=YOUR_API_KEY&model=tiny.en"

Diagnostics

Check Available Models

curl -X GET 'http://localhost:7860/v1/models' \
  -H "Authorization: Token YOUR_API_KEY"

Health Check

curl http://localhost:7860/ping

API Documentation

Open in your browser:

http://localhost:7860/docs    # Swagger UI
http://localhost:7860/redoc   # ReDoc