W
Whisper API
Navigation

What is Whisper API?

Whisper API wraps the blazing-fast whisper.cpp engine with a clean, Deepgram-compatible REST and WebSocket interface. Deploy it on your own infrastructure — your audio never leaves your servers.

It handles everything from simple file uploads to real-time microphone streaming, with built-in subtitle export and speaker diarization.


Core Capabilities

Deepgram-Compatible API

Drop-in replacement for /v1/listen endpoints. Migrate existing Deepgram integrations with minimal code changes.

Real-Time Streaming

WebSocket endpoint for live transcription. Supports PCM, WebM, OGG, FLAC, and auto-detection.

Local CPU Transcription

Run transcription directly on your own machine CPU with no external cloud dependency.

Offline Key Management

Generate and revoke API keys via CLI. No external auth services required.


Architecture

The server sits between your clients and the whisper.cpp binary, handling audio conversion, authentication, and response formatting.

Client (cURL / Python / JS / Deepgram SDK)

  ▼  HTTP POST or WebSocket
FastAPI Server (:7860)
  ├── /v1/listen     POST   → Transcribe uploaded file or URL
  ├── /v1/listen     WS     → Real-time streaming transcription
  ├── /v1/models     GET    → List available GGML models
  └── /v1/auth       POST   → Test token generation (dev only)

  ▼  subprocess
whisper-cli (compiled from whisper.cpp)

  ▼  reads
models/ggml-*.bin (GGML quantized model files)


Stack

LayerTechnology
RuntimePython 3.10+ / FastAPI
Transcriptionwhisper.cpp (C++ binary via subprocess)
ModelsGGML quantized .bin files
DatabaseSQLite via SQLAlchemy
StreamingNative FastAPI WebSocket
ContainerDocker (Debian Bookworm)
LicenseMIT