W
Whisper API
Navigation

The Whisper API uses GGML-format model files (.bin) compatible with whisper.cpp. Models live in the models/ directory and are auto-discovered by the server on startup.


Available Model Sizes

ModelParametersDisk SizeRelative SpeedBest For
tiny39M~75 MBFastestQuick prototyping, low-resource devices
tiny.en39M~75 MBFastestEnglish-only, highest speed
base74M~142 MBFastGood balance for English
base.en74M~142 MBFastEnglish-only with better accuracy
small244M~466 MBMediumMulti-language, good accuracy
small.en244M~466 MBMediumEnglish-only, recommended for production
medium769M~1.5 GBSlowHigh accuracy, multi-language
medium.en769M~1.5 GBSlowEnglish-only, near-best accuracy
large-v21550M~3.1 GBSlowestMaximum accuracy, all languages
large-v31550M~3.1 GBSlowestLatest, best multilingual

Quantization Formats

GGML models support multiple quantization levels to trade accuracy for speed and memory:

FormatBitsSize ReductionQualityRecommended
f3232-bit1x (baseline)HighestNo (too large)
f1616-bit~0.5xVery HighResearch only
q8_08-bit~0.25xHighYes, if memory allows
q5_15-bit~0.16xGoodBest overall
q5_05-bit~0.16xGoodAlternative to q5_1
q4_14-bit~0.125xFairMax speed priority
q4_04-bit~0.125xFairSmallest size

Downloading Models

From Hugging Face

Official GGML models are available on the whisper.cpp Hugging Face repository:

# Download tiny.en (q5_1 quantized)
curl -L -o models/ggml-tiny.en-q5_1.bin \
  "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en-q5_1.bin"

# Download small.en (q5_1 quantized)
curl -L -o models/ggml-small.en-q5_1.bin \
  "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.en-q5_1.bin"

# Download base.en (full precision)
curl -L -o models/ggml-base.en.bin \
  "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin"

From the Setup Script

The setup_whisper.sh script automatically downloads the tiny.en model if none exists:

./setup_whisper.sh

Adding a New Model

    1. Download the .bin file from Hugging Face or another source

    2. Move it to the models/ directory:

      mv ggml-small.en-q5_1.bin models/
    3. Restart the API server:

      # Stop the running server (Ctrl+C)
      uvicorn app.main:app --host 0.0.0.0 --port 7860
    4. Verify the model is available:

      curl -X GET 'http://localhost:7860/v1/models' \
        -H "Authorization: Token YOUR_API_KEY"

Model File Naming

The API uses the model filename (without ggml- prefix and .bin suffix) as the model identifier in API requests:

File NameAPI model Parameter
ggml-tiny.en.bintiny.en
ggml-model-whisper-small.en-q5_1.binmodel-whisper-small.en-q5_1
ggml-base.en.binbase.en

Model Directory Configuration

The model directory is configured via the MODELS_DIR environment variable:

# .env
MODELS_DIR=./models

The server scans this directory on startup and makes all .bin files available for transcription.