Navigation
This guide walks you through setting up the Whisper API from scratch on your local machine.
Prerequisites
Before you begin, make sure you have the following installed:
| Requirement | Version | Purpose |
|---|---|---|
| Python | 3.10+ | Runtime for the FastAPI server |
| pip | Latest | Python package manager |
| FFmpeg | Any | Audio transcoding to 16kHz WAV |
| Git | Any | Clone the repository |
| CMake | 3.14+ | Build whisper.cpp from source |
| C++ Compiler | g++ or clang++ | Compile the whisper binary |
Installation
-
Clone the Repository
git clone https://github.com/innovatorved/whisper.api.git cd whisper.api -
Create a Virtual Environment (recommended)
venv
python -m venv .venv source .venv/bin/activateconda
conda create -n whisper python=3.10 conda activate whisper -
Install Python Dependencies
pip install -r requirements.txt -
Configure Environment Variables
Copy the example file and customize:
cp .env.example .envKey variables:
# Security — change this in production! SECRET_KEY=your-secret-key-here # Server SERVER_HOST=http://localhost:7860 # Database DATABASE_URL=sqlite:///./whisper.db # Whisper binary & models WHISPER_BINARY_PATH=./binary/whisper-cli MODELS_DIR=./models # Concurrency MAX_CONCURRENT_TRANSCRIPTIONS=2 # Streaming chunk size (ms) — wall-clock buffer before each WS transcribe pass STREAM_CHUNK_DURATION_MS=2000 # Limits & safety (optional; defaults are sensible for production) # MAX_AUDIO_UPLOAD_BYTES=52428800 # MAX_AUDIO_DOWNLOAD_BYTES=52428800 # AUDIO_URL_FOLLOW_REDIRECTS=false # WHISPER_CLI_TIMEOUT_SEC=3600 # FFMPEG_TIMEOUT_SEC=600 -
Build the whisper.cpp Binary
The setup script clones, builds, and installs the
whisper-clibinary:chmod +x setup_whisper.sh ./setup_whisper.sh -
Initialize the Database & Create an API Key
python -m app.cli init python -m app.cli create --name "MyFirstKey"Save the generated token — you’ll need it for all API requests.
-
Start the Server
uvicorn app.main:app --host 0.0.0.0 --port 7860 --reload
Verify Your Installation
Once the server is running, verify it works:
Health Check
curl http://localhost:7860/ping
Expected response:
{ "ping": "pong", "status": "healthy" }
Swagger UI
Open http://localhost:7860/docs in your browser to access the interactive API documentation.
First Transcription
curl -X POST 'http://localhost:7860/v1/listen' \
-H "Authorization: Token YOUR_API_KEY" \
-H "Content-Type: audio/wav" \
--data-binary @audio/jfk.wav
Environment Variables Reference
| Variable | Default | Description |
|---|---|---|
SECRET_KEY | — | Required. Secret for token hashing |
SERVER_HOST | http://localhost:7860 | Public server URL |
DATABASE_URL | sqlite:///./whisper.db | SQLAlchemy database connection string |
WHISPER_BINARY_PATH | ./binary/whisper-cli | Path to the compiled whisper-cli binary |
MODELS_DIR | ./models | Directory containing .bin model files |
MAX_CONCURRENT_TRANSCRIPTIONS | 2 | Max parallel transcription processes |
STREAM_CHUNK_DURATION_MS | 2000 | WebSocket: target wall-clock buffer length per chunk (ms); PCM sizing uses client sample_rate |
MAX_AUDIO_UPLOAD_BYTES | 52428800 (50 MiB) | Max size for raw or multipart upload bodies |
MAX_AUDIO_DOWNLOAD_BYTES | 52428800 (50 MiB) | Max size when downloading audio from a JSON url |
AUDIO_URL_FOLLOW_REDIRECTS | false | If true, follow redirects when fetching url (re-validates final host for SSRF) |
WHISPER_CLI_TIMEOUT_SEC | 3600 | Max seconds for a single whisper-cli run |
FFMPEG_TIMEOUT_SEC | 600 | Max seconds for ffmpeg conversion / probe operations |
ENABLE_TEST_TOKEN_ENDPOINT | false | Enable /v1/auth/test-token in Swagger |
BACKEND_CORS_ORIGINS | (see config.py) | Allowed CORS origins; override via env as needed for your deployment |
Next Steps
- Set up authentication — manage API keys with the CLI
- Explore the API — full endpoint reference
- Deploy with Docker — containerized production setup