OME Installation Guide¶

This guide provides step-by-step instructions for installing OME (Open Model Engine), which is a required prerequisite for the LLM Autotuner.

Table of Contents¶

Why OME is Required
Prerequisites
Quick Installation
Manual Installation
Verification
Post-Installation Setup
Troubleshooting

Why OME is Required¶

The autotuner depends on OME for core functionality:

InferenceService Deployment: OME creates and manages SGLang inference servers with configurable parameters
Parameter Control: Allows programmatic control of runtime parameters (tp_size, mem_frac, batch_size, etc.)
BenchmarkJob Execution: Provides CRDs for running benchmarks against deployed services
Model Management: Handles model loading, caching, and lifecycle
Resource Orchestration: Manages GPU allocation and Kubernetes resources

Without OME, the autotuner cannot:

Automatically deploy InferenceServices with different configurations
Run parameter tuning experiments
Execute the core autotuning workflow

Prerequisites¶

Before installing OME, ensure you have:

1. Kubernetes Cluster¶

Version: v1.28 or later
Platforms: Minikube, Kind, EKS, GKE, AKS, or bare-metal
Resources:
- At least 4 CPU cores
- 8GB+ RAM
- GPU support (for inference workloads)

2. kubectl¶

kubectl version --client
# Should show v1.22+

3. Helm (v3+)¶

helm version
# Should show v3.x

If Helm is not installed:

# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

4. Git Submodules¶

The OME repository must be initialized as a submodule:

git submodule update --init --recursive

Quick Installation¶

Recommended Method: Use the autotuner installation script with the --install-ome flag:

cd /path/to/autotuner
./install.sh --install-ome

This automatically installs:

cert-manager (required dependency)
OME CRDs from OCI registry
OME resources from local Helm charts
All necessary configurations

Installation time: ~3-5 minutes

Manual Installation¶

If you prefer manual installation or the automatic method fails:

Step 1: Install cert-manager (Required Dependency)¶

OME requires cert-manager for TLS certificate management:

# Add Helm repository
helm repo add jetstack https://charts.jetstack.io
helm repo update

# Install cert-manager
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set crds.enabled=true \
  --wait --timeout=5m

Note: If you encounter webhook validation issues:

# Delete webhook configurations
kubectl delete validatingwebhookconfiguration cert-manager-webhook
kubectl delete mutatingwebhookconfiguration cert-manager-webhook

Step 2: Install KEDA (Required Dependency)¶

OME requires KEDA (Kubernetes Event-Driven Autoscaling) for InferenceService autoscaling:

# Add Helm repository
helm repo add kedacore https://kedacore.github.io/charts
helm repo update

# Install KEDA
helm install keda kedacore/keda \
  --namespace keda \
  --create-namespace \
  --wait --timeout=5m

Verify KEDA:

kubectl get pods -n keda
# Should show keda-operator and keda-metrics-apiserver pods running

kubectl get crd | grep keda
# Should show scaledobjects.keda.sh and related CRDs

Step 3: Install OME CRDs¶

# Install from OCI registry (recommended)
helm upgrade --install ome-crd \
  oci://ghcr.io/moirai-internal/charts/ome-crd \
  --namespace ome \
  --create-namespace

Verify CRDs:

kubectl get crd | grep ome.io
# Should show: inferenceservices, benchmarkjobs, clusterbasemodels, clusterservingruntimes, etc.

Step 4: Install OME Resources¶

# Install from local charts
cd third_party/ome
helm upgrade --install ome charts/ome-resources \
  --namespace ome \
  --wait --timeout=7m

Verify installation:

kubectl get pods -n ome
# Should show ome-controller-manager pods running

Verification¶

After installation, verify OME is working correctly:

1. Check OME Namespace¶

kubectl get namespace ome
# Status should be: Active

2. Check OME Pods¶

kubectl get pods -n ome

# Expected pods (all Running):
# NAME                                       READY   STATUS
# ome-controller-manager-xxx                 2/2     Running
# ome-controller-manager-yyy                 2/2     Running
# ome-controller-manager-zzz                 2/2     Running
# ome-model-agent-xxx                        1/1     Running

3. Check CRDs¶

kubectl get crd | grep ome.io | wc -l
# Should show: 6 (or more)

4. Check OME Controller Logs¶

kubectl logs -n ome deployment/ome-controller-manager --tail=50

# Should show startup logs without errors

5. Verify API Resources¶

kubectl api-resources | grep ome.io

# Should list all OME resources:
# clusterbasemodels          ome.io/v1beta1          false   ClusterBaseModel
# clusterservingruntimes     ome.io/v1beta1          false   ClusterServingRuntime
# inferenceservices          ome.io/v1beta1          true    InferenceService
# benchmarkjobs              ome.io/v1beta1          true    BenchmarkJob

Post-Installation Setup¶

After OME is installed, you need to create model and runtime resources.

Using Pre-configured Examples¶

OME includes pre-configured model definitions in third_party/ome/config/models/:

# List available models
ls third_party/ome/config/models/meta/

# Apply Llama 3.2 1B model
kubectl apply -f third_party/ome/config/models/meta/Llama-3.2-1B-Instruct.yaml

# Verify model created
kubectl get clusterbasemodels

Note: The storage path in these YAMLs may need adjustment based on your setup.

Creating Custom ClusterBaseModel¶

Create a custom model configuration:

# my-model.yaml
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
  name: my-model
spec:
  displayName: vendor.model-name
  vendor: vendor-name
  disabled: false
  version: "1.0.0"
  storage:
    storageUri: hf://vendor/model-name
    path: /path/to/models/my-model
    # Optional: HuggingFace token for gated models
    # key: "hf-token"

Apply:

kubectl apply -f my-model.yaml

Creating ClusterServingRuntime¶

Create a runtime configuration for SGLang:

# my-runtime.yaml
apiVersion: ome.io/v1beta1
kind: ClusterServingRuntime
metadata:
  name: my-runtime
spec:
  supportedModels:
    - my-model
  engine:
    type: sglang
    version: "v0.4.8.post1"
  image: "docker.io/lmsysorg/sglang:v0.4.8.post1-cu126"
  protocol: "openai"
  resources:
    limits:
      nvidia.com/gpu: 1
      memory: 16Gi
    requests:
      cpu: 2
      memory: 8Gi
  runner:
    command: ["python3", "-m", "sglang.launch_server"]
    args:
      - "--model-path"
      - "$(MODEL_PATH)"
      - "--host"
      - "0.0.0.0"
      - "--port"
      - "8000"

Apply:

kubectl apply -f my-runtime.yaml

Example Files¶

The autotuner includes example configurations:

config/examples/clusterbasemodel-llama-3.2-1b.yaml
config/examples/clusterservingruntime-sglang.yaml

Important: These are templates and require:

Valid storage paths
Model access permissions
Appropriate resource limits

Troubleshooting¶

Issue 1: cert-manager Webhook Timeout¶

Symptoms:

Error: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.cert-manager.svc:443/validate?timeout=30s": context deadline exceeded

Cause: The cert-manager webhook is not responding, often due to networking issues in the cluster.

Solution:

# Delete webhook configurations to bypass validation
kubectl delete validatingwebhookconfiguration cert-manager-webhook
kubectl delete mutatingwebhookconfiguration cert-manager-webhook

# Retry OME installation
helm upgrade --install ome third_party/ome/charts/ome-resources --namespace ome

Issue 2: OME Pods Not Starting¶

Symptoms:

kubectl get pods -n ome
# Pods in CrashLoopBackOff or Error state

Solutions:

Check pod logs:
```
kubectl logs -n ome <pod-name>
```
Check events:
```
kubectl describe pod -n ome <pod-name>
```
Common issues:
- Missing RBAC permissions → Reinstall with proper service account
- Image pull errors → Check network/registry access
- Resource constraints → Increase node resources

Issue 2: CRDs Not Installed¶

Symptoms:

kubectl get crd | grep ome.io
# No output

Solution:

# Manually install CRDs
kubectl apply -f third_party/ome/config/crd/

Issue 3: InferenceService Stays in Not Ready¶

Symptoms:

kubectl get inferenceservices
# Ready=False for extended period

Debugging:

# Check InferenceService status
kubectl describe inferenceservice <name>

# Check underlying pods
kubectl get pods -l serving.kserve.io/inferenceservice=<name>

# Check pod logs
kubectl logs <pod-name>

Common causes:

Model not found or not downloaded
Insufficient GPU resources
Image pull failures
Incorrect runtime configuration

Issue 4: GPU Not Allocated¶

Symptoms: InferenceService pods not getting GPU resources

Solution:

Verify GPU operator is installed:

kubectl get nodes -o json | jq '.items[].status.allocatable'
# Should show nvidia.com/gpu

Install NVIDIA device plugin if missing:

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/main/nvidia-device-plugin.yml

Issue 5: Model Download Failures¶

Symptoms: Pods fail with “cannot download model” errors

Solutions:

Check internet connectivity from pods

Configure HuggingFace token if using gated models:

# Add to ClusterBaseModel
source:
  huggingface:
    token:
      secretKeyRef:
        name: hf-token
        key: token

Use pre-downloaded models:

# Mount existing model directory
modelPath: "/path/to/local/models"

Additional Resources¶

OME GitHub Repository: https://github.com/sgl-project/ome
OME Documentation: Check third_party/ome/docs/ for detailed docs
SGLang Documentation: https://github.com/sgl-project/sglang
Example Configurations: See third_party/ome/examples/

Quick Verification Script¶

After installation, run this script to verify everything is working:

#!/bin/bash
# verify-ome.sh

echo "Checking OME installation..."

# Check namespace
if kubectl get namespace ome &> /dev/null; then
    echo "✅ OME namespace exists"
else
    echo "❌ OME namespace not found"
    exit 1
fi

# Check pods
POD_COUNT=$(kubectl get pods -n ome --no-headers 2>/dev/null | wc -l)
RUNNING_COUNT=$(kubectl get pods -n ome --no-headers 2>/dev/null | grep Running | wc -l)

echo "✅ OME pods: $RUNNING_COUNT/$POD_COUNT running"

if [ "$RUNNING_COUNT" -eq 0 ]; then
    echo "❌ No OME pods are running"
    exit 1
fi

# Check CRDs
CRD_COUNT=$(kubectl get crd | grep ome.io | wc -l)
echo "✅ OME CRDs installed: $CRD_COUNT"

if [ "$CRD_COUNT" -lt 4 ]; then
    echo "❌ Missing OME CRDs (expected at least 4)"
    exit 1
fi

# Check models
MODEL_COUNT=$(kubectl get clusterbasemodels --no-headers 2>/dev/null | wc -l)
echo "✅ ClusterBaseModels: $MODEL_COUNT"

# Check runtimes
RUNTIME_COUNT=$(kubectl get clusterservingruntimes --no-headers 2>/dev/null | wc -l)
echo "✅ ClusterServingRuntimes: $RUNTIME_COUNT"

echo ""
echo "OME installation verified successfully!"
echo "You can now run the autotuner installation: ./install.sh"

Save this as verify-ome.sh, make it executable, and run:

chmod +x verify-ome.sh
./verify-ome.sh

Next Steps¶

Once OME is installed and verified:

Run the autotuner installation:
```
cd /root/work/autotuner
./install.sh
```
Configure your first tuning task:
```
vi examples/simple_task.json
```

Start tuning:

source env/bin/activate
python src/run_autotuner.py examples/simple_task.json --direct

For more information, see the main README.md.