# OME Installation Guide This guide provides step-by-step instructions for installing OME (Open Model Engine), which is a **required prerequisite** for the LLM Autotuner. ## Table of Contents 1. [Why OME is Required](#why-ome-is-required) 2. [Prerequisites](#prerequisites) 3. [Quick Installation](#quick-installation) 4. [Manual Installation](#manual-installation) 5. [Verification](#verification) 6. [Post-Installation Setup](#post-installation-setup) 7. [Troubleshooting](#troubleshooting) --- ## Why OME is Required The autotuner depends on OME for core functionality: - **InferenceService Deployment**: OME creates and manages SGLang inference servers with configurable parameters - **Parameter Control**: Allows programmatic control of runtime parameters (tp_size, mem_frac, batch_size, etc.) - **BenchmarkJob Execution**: Provides CRDs for running benchmarks against deployed services - **Model Management**: Handles model loading, caching, and lifecycle - **Resource Orchestration**: Manages GPU allocation and Kubernetes resources **Without OME, the autotuner cannot:** - Automatically deploy InferenceServices with different configurations - Run parameter tuning experiments - Execute the core autotuning workflow --- ## Prerequisites Before installing OME, ensure you have: ### 1. Kubernetes Cluster - **Version**: v1.28 or later - **Platforms**: Minikube, Kind, EKS, GKE, AKS, or bare-metal - **Resources**: - At least 4 CPU cores - 8GB+ RAM - GPU support (for inference workloads) ### 2. kubectl ```bash kubectl version --client # Should show v1.22+ ``` ### 3. Helm (v3+) ```bash helm version # Should show v3.x ``` If Helm is not installed: ```bash # Install Helm curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash ``` ### 4. Git Submodules The OME repository must be initialized as a submodule: ```bash git submodule update --init --recursive ``` --- ## Quick Installation **Recommended Method:** Use the autotuner installation script with the `--install-ome` flag: ```bash cd /path/to/autotuner ./install.sh --install-ome ``` This automatically installs: 1. cert-manager (required dependency) 2. OME CRDs from OCI registry 3. OME resources from local Helm charts 4. All necessary configurations **Installation time:** ~3-5 minutes --- ## Manual Installation If you prefer manual installation or the automatic method fails: ### Step 1: Install cert-manager (Required Dependency) OME requires cert-manager for TLS certificate management: ```bash # Add Helm repository helm repo add jetstack https://charts.jetstack.io helm repo update # Install cert-manager helm install cert-manager jetstack/cert-manager \ --namespace cert-manager \ --create-namespace \ --set crds.enabled=true \ --wait --timeout=5m ``` **Note:** If you encounter webhook validation issues: ```bash # Delete webhook configurations kubectl delete validatingwebhookconfiguration cert-manager-webhook kubectl delete mutatingwebhookconfiguration cert-manager-webhook ``` ### Step 2: Install KEDA (Required Dependency) OME requires KEDA (Kubernetes Event-Driven Autoscaling) for InferenceService autoscaling: ```bash # Add Helm repository helm repo add kedacore https://kedacore.github.io/charts helm repo update # Install KEDA helm install keda kedacore/keda \ --namespace keda \ --create-namespace \ --wait --timeout=5m ``` **Verify KEDA:** ```bash kubectl get pods -n keda # Should show keda-operator and keda-metrics-apiserver pods running kubectl get crd | grep keda # Should show scaledobjects.keda.sh and related CRDs ``` ### Step 3: Install OME CRDs ```bash # Install from OCI registry (recommended) helm upgrade --install ome-crd \ oci://ghcr.io/moirai-internal/charts/ome-crd \ --namespace ome \ --create-namespace ``` **Verify CRDs:** ```bash kubectl get crd | grep ome.io # Should show: inferenceservices, benchmarkjobs, clusterbasemodels, clusterservingruntimes, etc. ``` ### Step 4: Install OME Resources ```bash # Install from local charts cd third_party/ome helm upgrade --install ome charts/ome-resources \ --namespace ome \ --wait --timeout=7m ``` **Verify installation:** ```bash kubectl get pods -n ome # Should show ome-controller-manager pods running ``` --- ## Verification After installation, verify OME is working correctly: ### 1. Check OME Namespace ```bash kubectl get namespace ome # Status should be: Active ``` ### 2. Check OME Pods ```bash kubectl get pods -n ome # Expected pods (all Running): # NAME READY STATUS # ome-controller-manager-xxx 2/2 Running # ome-controller-manager-yyy 2/2 Running # ome-controller-manager-zzz 2/2 Running # ome-model-agent-xxx 1/1 Running ``` ### 3. Check CRDs ```bash kubectl get crd | grep ome.io | wc -l # Should show: 6 (or more) ``` ### 4. Check OME Controller Logs ```bash kubectl logs -n ome deployment/ome-controller-manager --tail=50 # Should show startup logs without errors ``` ### 5. Verify API Resources ```bash kubectl api-resources | grep ome.io # Should list all OME resources: # clusterbasemodels ome.io/v1beta1 false ClusterBaseModel # clusterservingruntimes ome.io/v1beta1 false ClusterServingRuntime # inferenceservices ome.io/v1beta1 true InferenceService # benchmarkjobs ome.io/v1beta1 true BenchmarkJob ``` --- ## Post-Installation Setup After OME is installed, you need to create model and runtime resources. ### Using Pre-configured Examples OME includes pre-configured model definitions in `third_party/ome/config/models/`: ```bash # List available models ls third_party/ome/config/models/meta/ # Apply Llama 3.2 1B model kubectl apply -f third_party/ome/config/models/meta/Llama-3.2-1B-Instruct.yaml # Verify model created kubectl get clusterbasemodels ``` **Note:** The storage path in these YAMLs may need adjustment based on your setup. ### Creating Custom ClusterBaseModel Create a custom model configuration: ```yaml # my-model.yaml apiVersion: ome.io/v1beta1 kind: ClusterBaseModel metadata: name: my-model spec: displayName: vendor.model-name vendor: vendor-name disabled: false version: "1.0.0" storage: storageUri: hf://vendor/model-name path: /path/to/models/my-model # Optional: HuggingFace token for gated models # key: "hf-token" ``` Apply: ```bash kubectl apply -f my-model.yaml ``` ### Creating ClusterServingRuntime Create a runtime configuration for SGLang: ```yaml # my-runtime.yaml apiVersion: ome.io/v1beta1 kind: ClusterServingRuntime metadata: name: my-runtime spec: supportedModels: - my-model engine: type: sglang version: "v0.4.8.post1" image: "docker.io/lmsysorg/sglang:v0.4.8.post1-cu126" protocol: "openai" resources: limits: nvidia.com/gpu: 1 memory: 16Gi requests: cpu: 2 memory: 8Gi runner: command: ["python3", "-m", "sglang.launch_server"] args: - "--model-path" - "$(MODEL_PATH)" - "--host" - "0.0.0.0" - "--port" - "8000" ``` Apply: ```bash kubectl apply -f my-runtime.yaml ``` ### Example Files The autotuner includes example configurations: - `config/examples/clusterbasemodel-llama-3.2-1b.yaml` - `config/examples/clusterservingruntime-sglang.yaml` **Important:** These are templates and require: - Valid storage paths - Model access permissions - Appropriate resource limits --- ## Troubleshooting ### Issue 1: cert-manager Webhook Timeout **Symptoms:** ``` Error: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.cert-manager.svc:443/validate?timeout=30s": context deadline exceeded ``` **Cause:** The cert-manager webhook is not responding, often due to networking issues in the cluster. **Solution:** ```bash # Delete webhook configurations to bypass validation kubectl delete validatingwebhookconfiguration cert-manager-webhook kubectl delete mutatingwebhookconfiguration cert-manager-webhook # Retry OME installation helm upgrade --install ome third_party/ome/charts/ome-resources --namespace ome ``` ### Issue 2: OME Pods Not Starting **Symptoms:** ```bash kubectl get pods -n ome # Pods in CrashLoopBackOff or Error state ``` **Solutions:** 1. Check pod logs: ```bash kubectl logs -n ome ``` 2. Check events: ```bash kubectl describe pod -n ome ``` 3. Common issues: - Missing RBAC permissions → Reinstall with proper service account - Image pull errors → Check network/registry access - Resource constraints → Increase node resources ### Issue 2: CRDs Not Installed **Symptoms:** ```bash kubectl get crd | grep ome.io # No output ``` **Solution:** ```bash # Manually install CRDs kubectl apply -f third_party/ome/config/crd/ ``` ### Issue 3: InferenceService Stays in Not Ready **Symptoms:** ```bash kubectl get inferenceservices # Ready=False for extended period ``` **Debugging:** ```bash # Check InferenceService status kubectl describe inferenceservice # Check underlying pods kubectl get pods -l serving.kserve.io/inferenceservice= # Check pod logs kubectl logs ``` **Common causes:** - Model not found or not downloaded - Insufficient GPU resources - Image pull failures - Incorrect runtime configuration ### Issue 4: GPU Not Allocated **Symptoms:** InferenceService pods not getting GPU resources **Solution:** 1. Verify GPU operator is installed: ```bash kubectl get nodes -o json | jq '.items[].status.allocatable' # Should show nvidia.com/gpu ``` 2. Install NVIDIA device plugin if missing: ```bash kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/main/nvidia-device-plugin.yml ``` ### Issue 5: Model Download Failures **Symptoms:** Pods fail with "cannot download model" errors **Solutions:** 1. Check internet connectivity from pods 2. Configure HuggingFace token if using gated models: ```yaml # Add to ClusterBaseModel source: huggingface: token: secretKeyRef: name: hf-token key: token ``` 3. Use pre-downloaded models: ```yaml # Mount existing model directory modelPath: "/path/to/local/models" ``` --- ## Additional Resources - **OME GitHub Repository**: https://github.com/sgl-project/ome - **OME Documentation**: Check `third_party/ome/docs/` for detailed docs - **SGLang Documentation**: https://github.com/sgl-project/sglang - **Example Configurations**: See `third_party/ome/examples/` --- ## Quick Verification Script After installation, run this script to verify everything is working: ```bash #!/bin/bash # verify-ome.sh echo "Checking OME installation..." # Check namespace if kubectl get namespace ome &> /dev/null; then echo "✅ OME namespace exists" else echo "❌ OME namespace not found" exit 1 fi # Check pods POD_COUNT=$(kubectl get pods -n ome --no-headers 2>/dev/null | wc -l) RUNNING_COUNT=$(kubectl get pods -n ome --no-headers 2>/dev/null | grep Running | wc -l) echo "✅ OME pods: $RUNNING_COUNT/$POD_COUNT running" if [ "$RUNNING_COUNT" -eq 0 ]; then echo "❌ No OME pods are running" exit 1 fi # Check CRDs CRD_COUNT=$(kubectl get crd | grep ome.io | wc -l) echo "✅ OME CRDs installed: $CRD_COUNT" if [ "$CRD_COUNT" -lt 4 ]; then echo "❌ Missing OME CRDs (expected at least 4)" exit 1 fi # Check models MODEL_COUNT=$(kubectl get clusterbasemodels --no-headers 2>/dev/null | wc -l) echo "✅ ClusterBaseModels: $MODEL_COUNT" # Check runtimes RUNTIME_COUNT=$(kubectl get clusterservingruntimes --no-headers 2>/dev/null | wc -l) echo "✅ ClusterServingRuntimes: $RUNTIME_COUNT" echo "" echo "OME installation verified successfully!" echo "You can now run the autotuner installation: ./install.sh" ``` Save this as `verify-ome.sh`, make it executable, and run: ```bash chmod +x verify-ome.sh ./verify-ome.sh ``` --- ## Next Steps Once OME is installed and verified: 1. **Run the autotuner installation**: ```bash cd /root/work/autotuner ./install.sh ``` 2. **Configure your first tuning task**: ```bash vi examples/simple_task.json ``` 3. **Start tuning**: ```bash source env/bin/activate python src/run_autotuner.py examples/simple_task.json --direct ``` For more information, see the main [README.md](../../README.md).