Installation Guide¶
Complete installation instructions for the LLM Autotuner.
System Requirements¶
OS: Linux (Ubuntu 20.04+ recommended)
Python: 3.8+
GPU: NVIDIA GPU with CUDA support (for inference)
Memory: 16GB+ RAM recommended
Automated Install (Recommended)¶
Use the installation script for automated setup:
# Clone repository
git clone <repository-url>
cd autotuner
# Run installation script
./install.sh
Script options:
./install.sh --help # Show all options
./install.sh --skip-k8s # Skip Kubernetes setup (for Docker/Local mode)
./install.sh --skip-venv # Use system Python instead of venv
./install.sh --install-ome # Include OME operator installation
What install.sh does:
Creates Python virtual environment (
env/)Installs Python dependencies from requirements.txt
Installs genai-bench CLI
Creates data directories (
~/.local/share/autotuner/)Verifies installation
Manual Install¶
If you prefer manual installation:
# Clone repository
git clone <repository-url>
cd autotuner
# Create virtual environment
python3 -m venv env
source env/bin/activate
# Install dependencies
pip install -r requirements.txt
pip install genai-bench
# Install frontend dependencies
cd frontend && npm install && cd ..
# Create data directory
mkdir -p ~/.local/share/autotuner
# Start Redis (for background jobs)
docker run -d -p 6379:6379 redis:alpine
Deployment Mode Setup¶
Docker Mode (Recommended for beginners)¶
Requirements:
Docker 20.10+ with NVIDIA Container Toolkit
Verify GPU access:
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
Download a model:
pip install huggingface_hub
huggingface-cli download meta-llama/Llama-3.2-1B-Instruct \
--local-dir /mnt/data/models/llama-3-2-1b-instruct
Local Mode (Direct GPU)¶
For running inference servers directly on local GPU without Docker:
# Install SGLang
pip install sglang[all]
# Or install vLLM
pip install vllm
OME Mode (Kubernetes)¶
See Kubernetes Guide for Kubernetes setup.
Configuration¶
Environment Variables¶
Create .env file in project root:
# Server ports
SERVER_PORT=8000
FRONTEND_PORT=5173
# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
# Model path (Docker mode)
DOCKER_MODEL_PATH=/mnt/data/models
# Proxy (if needed)
HTTP_PROXY=http://proxy:port
HTTPS_PROXY=http://proxy:port
NO_PROXY=localhost,127.0.0.1
# HuggingFace token (for gated models)
HF_TOKEN=your_token_here
Database¶
SQLite database is auto-created at:
~/.local/share/autotuner/autotuner.db
Starting Services¶
# Activate environment
source env/bin/activate
# Start backend + ARQ worker
./scripts/start_dev.sh
# Start frontend (separate terminal)
cd frontend && npm run dev
Default ports:
Frontend: http://localhost:5173
Backend API: http://localhost:8000
API Docs: http://localhost:8000/docs
Verification¶
# Check backend health
curl http://localhost:8000/api/system/health
# Expected: {"status":"healthy","database":"ok","redis":"ok"}
Troubleshooting¶
See Troubleshooting for common issues.
Common issues:
Redis not running →
docker run -d -p 6379:6379 redis:alpineGPU not accessible → Check NVIDIA drivers and Docker runtime
Port conflicts → Update ports in
.envfile