W
Whisper API
Navigation

The Whisper API provides a Deepgram-compatible REST interface for pre-recorded audio transcription and model management.


Endpoints Overview

MethodEndpointDescription
POST/v1/listenTranscribe audio (file upload or URL)
GET/v1/modelsList available models
GET/pingHealth check
POST/v1/auth/test-tokenGenerate test token (dev only)
WS/v1/listenLive streaming (see Streaming docs)

POST /v1/listen — Transcribe Audio

The primary transcription endpoint. Accepts audio either as a binary file upload or as a URL in a JSON body.

Request Parameters (Query String)

ParameterTypeDefaultDescription
modelstringtiny.enModel to use. See GET /v1/models for options.
languagestringenBCP-47 language code for transcription.
promptstringnullContext/vocabulary prompt to guide the model (e.g., "TURNIPS, MUTTON").
startinteger0Offset in milliseconds — skip audio before this point.
durationintegernullMaximum duration in milliseconds to process.
response_formatstringjsonResponse format: json, srt, or vtt.
diarizebooleanfalseEnable speaker separation (best with stereo audio).
utterancesbooleanfalseReturn speech interval metadata.

Request Headers

HeaderValueRequired
AuthorizationToken <your_api_key>Yes
Content-Typeaudio/wav, audio/mpeg, audio/mp4, etc.For file upload
Content-Typeapplication/jsonFor URL-based transcription

File Upload

Send raw audio bytes as the request body:

curl -X POST 'http://localhost:7860/v1/listen?model=tiny.en&language=en' \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @audio.wav

URL-Based Transcription

Send a JSON body with the audio file URL:

curl -X POST 'http://localhost:7860/v1/listen?model=tiny.en' \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/audio.mp3"}'

JSON Response Schema

When response_format=json (default), the response follows the Deepgram format:

{
  "metadata": {
    "request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "created": "2026-03-29T10:30:00.000000Z",
    "duration": 10.43,
    "channels": 1,
    "sha256": "abc123def456..."
  },
  "results": {
    "channels": [
      {
        "alternatives": [
          {
            "transcript": "And so my fellow Americans ask not what your country can do for you ask what you can do for your country",
            "confidence": 0.98,
            "words": [
              {
                "word": "and",
                "start": 0.0,
                "end": 0.32,
                "confidence": 0.97
              },
              {
                "word": "so",
                "start": 0.32,
                "end": 0.56,
                "confidence": 0.99
              }
            ]
          }
        ]
      }
    ]
  }
}

SRT Response

When response_format=srt, raw subtitle text is returned:

1
00:00:00,000 --> 00:00:03,200
And so my fellow Americans

2
00:00:03,200 --> 00:00:06,800
ask not what your country can do for you

3
00:00:06,800 --> 00:00:10,430
ask what you can do for your country

VTT Response

When response_format=vtt:

WEBVTT

00:00:00.000 --> 00:00:03.200
And so my fellow Americans

00:00:03.200 --> 00:00:06.800
ask not what your country can do for you

00:00:06.800 --> 00:00:10.430
ask what you can do for your country

GET /v1/models — List Models

Returns all .bin model files available in the models/ directory.

curl -X GET 'http://localhost:7860/v1/models' \
  -H "Authorization: Token YOUR_API_KEY"

Response:

{
  "models": [
    "tiny.en",
    "base.en",
    "small.en"
  ]
}

GET /ping — Health Check

A simple health check endpoint (no authentication required):

curl http://localhost:7860/ping
{
  "ping": "pong",
  "status": "healthy"
}

Error Responses

Status CodeDescription
400Bad request — invalid parameters or unsupported format
401Unauthorized — missing or invalid API key
413Payload too large — audio file exceeds size limit
415Unsupported media type — unrecognized content type
422Validation error — malformed request body
500Internal server error — transcription engine failure
503Service unavailable — max concurrent transcriptions reached

Error Response Format

{
  "detail": "Invalid or missing API key"
}