W
Whisper API
Navigation

The Whisper API supports real-time, low-latency transcription via WebSockets. Stream audio from microphones, files, or any source and receive transcription results as they’re processed.


Connection

Endpoint

WS /v1/listen

Query Parameters

ParameterTypeDefaultDescription
tokenstringRequired. Your API key
modelstringtiny.enModel to use for transcription
languagestringenBCP-47 language code
encodingstringlinear16Audio encoding format
sample_rateinteger16000PCM sample rate in Hz

Supported Encodings

EncodingDescription
linear16 / pcm16Raw 16-bit PCM (default, lowest latency)
wavWAV container
webmWebM container (browser MediaRecorder)
ogg / opusOGG/Opus container
mp3MP3 compressed
flacFLAC lossless
mp4 / m4aMP4/M4A container
autoServer auto-detects the format

Connecting

wscat

wscat -c "ws://localhost:7860/v1/listen?token=YOUR_API_KEY&model=tiny.en&language=en"

Python

import asyncio
import websockets

async def stream():
    uri = "ws://localhost:7860/v1/listen?token=YOUR_API_KEY&model=tiny.en"
    async with websockets.connect(uri) as ws:
        # Read and send audio chunks
        with open("audio.wav", "rb") as f:
            while chunk := f.read(8000):
                await ws.send(chunk)
                await asyncio.sleep(0.25)

        # Signal end of stream
        await ws.send('{"type": "CloseStream"}')

        # Receive results
        async for message in ws:
            print(message)

asyncio.run(stream())

JavaScript

const ws = new WebSocket(
  'ws://localhost:7860/v1/listen?token=YOUR_API_KEY&model=tiny.en'
);

ws.onopen = () => {
  console.log('Connected');
  // Send audio chunks via ws.send(audioBuffer)
};

ws.onmessage = (event) => {
  const result = JSON.parse(event.data);
  if (result.type === 'Results') {
    console.log(result.channel.alternatives[0].transcript);
  }
};

ws.onclose = () => console.log('Disconnected');

Sending Audio Data

PCM Format Specifications

For encoding=linear16 (default, best latency):

PropertyValue
Sample Rate16,000 Hz
Bit Depth16-bit
Channels1 (Mono)
EndiannessLittle-Endian
Recommended Chunk Size8,000 bytes (~250ms)

Compressed Formats

For encoding=webm, ogg, mp3, etc., send raw container bytes. The server handles decoding internally using FFmpeg.


Server Responses

1. Initial Metadata

On connection, the server immediately sends a metadata message:

{
  "type": "Metadata",
  "request_id": "c4937a39-3482-414b-be42-2750043044f2",
  "model_info": {
    "tiny.en": {
      "name": "whisper-tiny.en",
      "version": "ggml-v1",
      "arch": "whisper"
    }
  },
  "channels": 1,
  "created": "2026-03-30T00:03:18.621907Z"
}

2. Transcription Results

As audio is buffered and processed, the server streams JSON results:

{
  "type": "Results",
  "channel_index": [0, 1],
  "duration": 2.05,
  "start": 0.0,
  "is_final": true,
  "speech_final": false,
  "channel": {
    "alternatives": [
      {
        "transcript": "Hello world",
        "confidence": 0.98,
        "words": [
          { "word": "hello", "start": 0.0, "end": 0.5, "confidence": 0.97 },
          { "word": "world", "start": 0.5, "end": 1.0, "confidence": 0.99 }
        ]
      }
    ],
    "detected_language": "en"
  },
  "metadata": {
    "request_id": "c4937a39-3482-414b-be42-2750043044f2",
    "model_info": {
      "tiny.en": {
        "name": "whisper-tiny.en",
        "version": "ggml-v1",
        "arch": "whisper"
      }
    }
  },
  "from_finalize": false
}

Control Messages

Clients can send JSON-formatted control messages at any time:

Keep Alive

Prevents the connection from timing out during silence:

{ "type": "KeepAlive" }

Close Stream

Signals the server to process any remaining buffered audio and gracefully close the session:

{ "type": "CloseStream" }

Processing Pipeline

┌──────────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Client     │────▶│   Buffer     │────▶│   Decode     │────▶│  Transcribe  │
│  (Audio Src) │     │  (Accumulate │     │  (FFmpeg if   │     │  (whisper-   │
│              │     │   chunks)    │     │   compressed) │     │   cli)       │
└──────────────┘     └──────────────┘     └──────────────┘     └──────┬───────┘

                     ┌──────────────┐                                  │
                     │   Client     │◀─────────────────────────────────┘
                     │  (JSON Resp) │         Results (Deepgram format)
                     └──────────────┘

The buffer accumulates audio for the configured STREAM_CHUNK_DURATION_MS (default: 5000ms) before processing each segment.


Example Scripts

File-Based Streaming Test

Stream a pre-recorded file to simulate real-time input:

python examples/test_streaming.py \
  --token YOUR_API_KEY \
  --audio audio/jfk.wav \
  --model tiny.en

Live Microphone Transcription

Stream audio directly from your microphone:

python examples/mic_transcription.py \
  --token YOUR_API_KEY \
  --model tiny.en \
  --device 3

List Audio Devices

Find the correct microphone device index:

python examples/mic_transcription.py --list-devices

Troubleshooting

IssueSolution
Empty transcriptionsEnsure audio is 16kHz mono PCM. Check encoding parameter matches actual format.
Connection rejectedVerify your API token is valid and passed as ?token=... query parameter.
High latencyUse linear16 encoding and smaller STREAM_CHUNK_DURATION_MS (e.g., 2000).
Garbled outputCheck sample rate matches. Use encoding=auto for non-PCM formats.