LLM AutotunerΒΆ
Automated parameter tuning for LLM inference engines (SGLang, vLLM).
Key Features:
Multiple Deployment Modes: Docker, Local (direct GPU), Kubernetes/OME
Optimization Strategies: Grid search, Random search, Bayesian optimization
SLO-Aware Scoring: Exponential penalties for constraint violations
GPU Intelligent Scheduling: Per-GPU efficiency metrics and resource pooling
Web UI: React frontend with real-time monitoring
Agent Assistant: LLM-powered assistant for task management
Getting Started
User Guide
Features
Architecture
- Autotuner: Comprehensive Deployment Architecture Analysis
- Executive Summary
- 1. Current Deployment Architecture
- 2. Key Deployment Components
- 3. Deployment Logic Implementation
- 4. Configuration Files
- 5. Deployment Entry Points
- 6. Deployment Assumptions and Constraints
- 7. Results Output and Storage
- 8. Deployment Workflow Diagram
- 9. Key Files and Their Roles
- 10. Deployment Scenarios
- 11. Summary of Deployment Mechanisms
- 12. Hardcoded Configuration Values
- LLM Autotuner - Product Roadmap
- Executive Summary
- Milestone Overview
- Milestone Timeline
- π Milestone 1: Core Autotuner Foundation
- π Milestone 2: Complete Web Interface & Parameter Preset System
- π Milestone 3: Runtime-Agnostic Configuration Architecture & GPU-Aware Optimization
- π Milestone 4: UI/UX Polish, SLO Filtering & Documentation
- π Milestone 5: Agent System & Local Deployment Mode
- Current Status: Production-Ready v0.2.0 β
- Future Roadmap
- Maintenance & Technical Debt
- Success Metrics
API Reference