LLM Autotuner¶

Automated parameter tuning for LLM inference engines (SGLang, vLLM).

Key Features:

Multiple Deployment Modes: Docker, Local (direct GPU), Kubernetes/OME
Optimization Strategies: Grid search, Random search, Bayesian optimization
SLO-Aware Scoring: Exponential penalties for constraint violations
GPU Intelligent Scheduling: Per-GPU efficiency metrics and resource pooling
Web UI: React frontend with real-time monitoring
Agent Assistant: LLM-powered assistant for task management

Getting Started

User Guide

Features

Architecture

API Reference

API Reference

Indices and tables¶