Installation & Setup | Whisper API Docs

Navigation

This guide walks you through setting up the Whisper API from scratch on your local machine.

Prerequisites

Before you begin, make sure you have the following installed:

Requirement	Version	Purpose
Python	3.10+	Runtime for the FastAPI server
pip	Latest	Python package manager
FFmpeg	Any	Audio transcoding to 16kHz WAV
Git	Any	Clone the repository
CMake	3.14+	Build whisper.cpp from source
C++ Compiler	g++ or clang++	Compile the whisper binary

Installation

Clone the Repository

git clone https://github.com/innovatorved/whisper.api.git
cd whisper.api

Create a Virtual Environment (recommended)

venv

python -m venv .venv
source .venv/bin/activate

conda

conda create -n whisper python=3.10
conda activate whisper

Install Python Dependencies
```
pip install -r requirements.txt
```

Configure Environment Variables

Copy the example file and customize:

cp .env.example .env

Key variables:

# Security — change this in production!
SECRET_KEY=your-secret-key-here

# Server
SERVER_HOST=http://localhost:7860

# Database
DATABASE_URL=sqlite:///./whisper.db

# Whisper binary & models
WHISPER_BINARY_PATH=./binary/whisper-cli
MODELS_DIR=./models

# Concurrency
MAX_CONCURRENT_TRANSCRIPTIONS=2

# Streaming chunk size (ms) — wall-clock buffer before each WS transcribe pass
STREAM_CHUNK_DURATION_MS=2000

# Limits & safety (optional; defaults are sensible for production)
# MAX_AUDIO_UPLOAD_BYTES=52428800
# MAX_AUDIO_DOWNLOAD_BYTES=52428800
# AUDIO_URL_FOLLOW_REDIRECTS=false
# WHISPER_CLI_TIMEOUT_SEC=3600
# FFMPEG_TIMEOUT_SEC=600

Build the whisper.cpp Binary

The setup script clones, builds, and installs the whisper-cli binary:
```
chmod +x setup_whisper.sh
./setup_whisper.sh
```
The script automatically detects your platform (macOS/Linux), enables CUDA if an NVIDIA GPU is found, and downloads a default tiny.en model.
Initialize the Database & Create an API Key
```
python -m app.cli init
python -m app.cli create --name "MyFirstKey"
```
Save the generated token — you’ll need it for all API requests.

Start the Server

uvicorn app.main:app --host 0.0.0.0 --port 7860 --reload

Verify Your Installation

Once the server is running, verify it works:

Health Check

curl http://localhost:7860/ping

Expected response:

{ "ping": "pong", "status": "healthy" }

Swagger UI

Open http://localhost:7860/docs in your browser to access the interactive API documentation.

First Transcription

curl -X POST 'http://localhost:7860/v1/listen' \
  -H "Authorization: Token YOUR_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @audio/jfk.wav

Environment Variables Reference

Variable	Default	Description
`SECRET_KEY`	—	Required. Secret for token hashing
`SERVER_HOST`	`http://localhost:7860`	Public server URL
`DATABASE_URL`	`sqlite:///./whisper.db`	SQLAlchemy database connection string
`WHISPER_BINARY_PATH`	`./binary/whisper-cli`	Path to the compiled whisper-cli binary
`MODELS_DIR`	`./models`	Directory containing `.bin` model files
`MAX_CONCURRENT_TRANSCRIPTIONS`	`2`	Max parallel transcription processes
`STREAM_CHUNK_DURATION_MS`	`2000`	WebSocket: target wall-clock buffer length per chunk (ms); PCM sizing uses client `sample_rate`
`MAX_AUDIO_UPLOAD_BYTES`	`52428800` (50 MiB)	Max size for raw or multipart upload bodies
`MAX_AUDIO_DOWNLOAD_BYTES`	`52428800` (50 MiB)	Max size when downloading audio from a JSON `url`
`AUDIO_URL_FOLLOW_REDIRECTS`	`false`	If `true`, follow redirects when fetching `url` (re-validates final host for SSRF)
`WHISPER_CLI_TIMEOUT_SEC`	`3600`	Max seconds for a single `whisper-cli` run
`FFMPEG_TIMEOUT_SEC`	`600`	Max seconds for ffmpeg conversion / probe operations
`ENABLE_TEST_TOKEN_ENDPOINT`	`false`	Enable `/v1/auth/test-token` in Swagger
`BACKEND_CORS_ORIGINS`	(see `config.py`)	Allowed CORS origins; override via env as needed for your deployment

Next Steps

Set up authentication — manage API keys with the CLI
Explore the API — full endpoint reference
Deploy with Docker — containerized production setup

← Previous

Whisper API

Authentication