Bayesian Optimization¶

Intelligent parameter search using machine learning to efficiently find optimal configurations.

Table of Contents¶

Overview
When to Use
How It Works
Configuration
Example Task
Comparison with Grid Search
Parameter Tuning
Best Practices
Troubleshooting

Overview¶

Bayesian Optimization is an intelligent search strategy that uses machine learning to explore the parameter space efficiently. Unlike grid search which exhaustively tests all combinations, Bayesian optimization builds a probabilistic model of the objective function and uses it to intelligently select which configurations to test next.

Key Benefits¶

80-87% fewer experiments: Typically finds optimal configurations in 20-30 experiments vs 100+ for grid search
Intelligent exploration: Balances exploring new regions vs exploiting promising areas
Continuous improvement: Each experiment makes the model smarter
Handles large spaces: Effective for parameter spaces where grid search is impractical

Implementation¶

The autotuner uses Optuna with the Tree-structured Parzen Estimator (TPE) sampler:

TPE models the objective function as two distributions: good and bad configurations
Uses Bayesian reasoning to suggest parameters likely to improve the objective
Supports mixed parameter types: categorical, continuous, integer, boolean

When to Use¶

Use Bayesian Optimization When:¶

Large parameter spaces: 50+ total combinations (e.g., 3 params with 5 values each = 125 combinations)
Expensive experiments: Each experiment takes >5 minutes
Budget constraints: Limited time or GPU resources
Complex interactions: Parameters have non-obvious relationships
Unknown optima: No prior knowledge of best configuration

Use Grid Search When:¶

Small spaces: <20 total combinations
Fast experiments: Each experiment takes <1 minute
Comprehensive coverage: Need to test ALL combinations
Known patterns: Parameter effects are well understood

Use Random Search When:¶

Quick exploration: Want fast insights without optimization
Baseline comparison: Need random sampling benchmark

How It Works¶

Phase 1: Initial Random Exploration (5 trials by default)¶

Experiment 1-5: Random sampling across parameter space
Goal: Build initial model of objective function

Phase 2: Bayesian Optimization (remaining trials)¶

For each trial:
1. Model predicts probability that each configuration will improve objective
2. Acquisition function balances:
   - Exploration: testing uncertain regions
   - Exploitation: testing near known good configurations
3. Execute experiment with selected configuration
4. Update model with new result
5. Repeat until max_iterations reached or convergence

TPE (Tree-structured Parzen Estimator)¶

# TPE models objective as two distributions:
P(params | objective < threshold)  # "good" configurations
P(params | objective >= threshold)  # "bad" configurations

# Suggests params that maximize ratio:
P(params | good) / P(params | bad)

Configuration¶

Task JSON Format¶

{
  "optimization": {
    "strategy": "bayesian",
    "objective": "minimize_latency",
    "max_iterations": 50
  },
  "parameters": {
    "tp-size": [1, 2, 4],
    "mem-fraction-static": [0.7, 0.75, 0.8, 0.85, 0.9],
    "schedule-policy": ["lpm", "fcfs"]
  }
}

Key Configuration Parameters¶

Parameter	Description	Default	Recommended
`max_iterations`	Total experiments to run	100	30-50 for most tasks
`n_initial_random`	Random trials before Bayesian starts	5	5-10 (10-20% of max_iterations)
`objective`	What to optimize	minimize_latency	Based on use case
`timeout_per_iteration`	Max time per experiment	600s	300-900s based on model size

Example Task¶

Full Task Configuration¶

{
  "task_name": "bayesian-llama3-tune",
  "description": "Bayesian optimization for Llama 3.2-1B",
  "model": {
    "id_or_path": "llama-3-2-1b-instruct",
    "namespace": "autotuner"
  },
  "base_runtime": "sglang",
  "runtime_image_tag": "v0.5.2-cu126",
  "parameters": {
    "tp-size": [1, 2],
    "mem-fraction-static": [0.7, 0.75, 0.8, 0.85, 0.9],
    "schedule-policy": ["lpm", "fcfs"],
    "chunked-prefill-size": [512, 1024, 2048, 4096]
  },
  "optimization": {
    "strategy": "bayesian",
    "objective": "minimize_latency",
    "max_iterations": 30,
    "timeout_per_iteration": 600
  },
  "benchmark": {
    "task": "text-to-text",
    "model_name": "Llama-3.2-1B-Instruct",
    "model_tokenizer": "meta-llama/Llama-3.2-1B-Instruct",
    "traffic_scenarios": ["D(100,100)"],
    "num_concurrency": [4, 8],
    "max_time_per_iteration": 30,
    "max_requests_per_iteration": 100,
    "additional_params": {
      "temperature": 0.0
    }
  }
}

Parameter Space Size¶

Grid search would require: 2 × 5 × 2 × 4 × 2 = 160 experiments
Bayesian optimization: ~30 experiments (81% reduction)

Expected Results¶

Convergence: Best configuration typically found within 15-20 experiments
Remaining experiments: Fine-tuning and validation
Total time: 5-10 hours vs 26+ hours for grid search

Comparison with Grid Search¶

Example Scenario: Llama-3.2-1B Tuning¶

Parameter Space:

tp-size: [1, 2, 4] → 3 values
mem-fraction-static: [0.7, 0.75, 0.8, 0.85, 0.9] → 5 values
schedule-policy: [“lpm”, “fcfs”] → 2 values
chunked-prefill-size: [512, 1024, 2048, 4096] → 4 values

Total combinations: 3 × 5 × 2 × 4 = 120

Strategy	Experiments	Time (est.)	GPU-hours	Best Score Found
Grid Search	120	20 hours	20	0.0825
Random Search	50	8.3 hours	8.3	0.0834
Bayesian	25	4.2 hours	4.2	0.0823

Efficiency gain: 79% fewer experiments, 79% less time, same or better result

Parameter Tuning¶

max_iterations¶

Purpose: Total number of experiments to run

Guidance:

Small space (<50 combinations): 20-30 iterations
Medium space (50-200 combinations): 30-50 iterations
Large space (>200 combinations): 50-100 iterations
Rule of thumb: 20-30% of grid search space size

n_initial_random¶

Purpose: Number of random trials before Bayesian optimization starts

Guidance:

Default: 5 trials (10% of max_iterations=50)
Small space: 5-10 trials
Large space: 10-20 trials
Rule of thumb: 10-20% of max_iterations

Best Practices¶

1. Start with Small max_iterations¶

{
  "optimization": {
    "strategy": "bayesian",
    "max_iterations": 20  // Start small, increase if needed
  }
}

Why: Test Bayesian setup without long wait. Increase if not converged.

2. Monitor Convergence¶

# Watch for "New best score" messages
tail -f ~/.local/share/autotuner/logs/task_<id>.log | grep "best score"

3. Use SLO Configuration¶

{
  "slo": {
    "latency": {
      "p90": {
        "threshold": 5.0,
        "weight": 2.0,
        "hard_fail": true,
        "fail_ratio": 0.2
      }
    },
    "steepness": 0.1
  }
}

Why: Guides Bayesian optimization to respect performance constraints.

Troubleshooting¶

Problem: Bayesian not improving over random baseline¶

Symptoms:

First 5 experiments find good config
Remaining experiments don’t improve

Solutions:

Too few parameters → Use random search
Parameters don’t interact → Grid search may be better
Noisy objective → Increase benchmark duration

Problem: Convergence too slow¶