Install & Configure Kubernetes Cluster
This guide walks you through creating and configuring a Kubernetes cluster using Rancher for deploying Lyra Platform.
Overview
After completing the prerequisites, you'll use Rancher to:
- Create a new Kubernetes cluster
- Configure cluster nodes (control plane and workers)
- Configure Rancher for Lyra deployments
- Verify cluster is ready for infrastructure deployment
Estimated Time: 30-60 minutes (including node provisioning)
Step 1: Access Rancher UI
-
Open your web browser and navigate to your Rancher server:
-
Login with the credentials you set during initial setup
-
You should see the Rancher dashboard with cluster management options
Step 2: Create Kubernetes Cluster
This guide uses bare-metal or VM deployments with a custom cluster configuration.
-
Click "Create" button in Rancher dashboard
-
Select "Custom" cluster type
-
Configure Cluster Settings:
Cluster Name: lyra-production (or your preferred name)
Kubernetes Version: v1.33.5+rke2r1 (recommended)
Network Provider: Calico - Provides both networking and network policy
Cloud Provider: None (for bare-metal/VMs)
- Click "Next"
Step 3: Configure Node Roles
Before configuring nodes, it's important to understand the three roles in Kubernetes:
etcd Role
Purpose Distributed key-value database that stores all cluster data and state.
Responsibilities - Stores cluster configuration and state - Maintains consistency across the cluster - Provides cluster-wide data persistence
Requirements - Must be odd number of nodes (3, 5, or 7) for quorum and fault tolerance - Low latency storage (SSD recommended) - Reliable network connectivity between etcd nodes
Control Plane Role
Purpose Manages the Kubernetes cluster and makes decisions about workload scheduling.
Responsibilities - API server (kubectl commands go here) - Scheduler (decides which node runs which pod) - Controller manager (maintains desired cluster state) - Cloud controller manager (cloud provider integration)
Requirements - Adequate CPU and memory for cluster management - Should be highly available (3 nodes recommended for production)
Worker Role
Purpose Runs application workloads and services.
Responsibilities - Runs containerized applications (pods) - Provides compute resources for workloads - Executes storage operations (when Ceph/Rook is deployed)
Requirements
- Storage disks for Ceph/Rook (e.g., /dev/sdb, /dev/sdc)
- Adequate CPU and memory for application workloads
- Can be scaled horizontally (add more workers as needed)
Deployment Strategies
Rancher will display a registration command for adding nodes. Choose one of the deployment strategies below based on your infrastructure.
Deployment Strategy A: Dedicated Roles (Recommended for Production)
This approach separates control plane and worker responsibilities for better isolation and performance.
Node Configuration: - Control Plane Nodes: etcd + Control Plane only (no Worker role) - Worker Nodes: Worker role only - Total Nodes: Minimum 4 nodes (1 control plane + 3 workers) or 8+ nodes for HA (3 control plane + 5 workers)
Configure Control Plane Nodes
- In Rancher UI, check the boxes:
- ✅ etcd
- ✅ Control Plane
-
⬜ Worker (leave unchecked)
-
Copy the registration command shown in the Rancher UI
-
SSH to your control plane server(s) and run the command:
# Example command (yours will be different): curl -fL https://rancher-server-ip/system-agent-install.sh | sudo sh -s - \ --server https://rancher-server-ip \ --label 'cattle.io/os=linux' \ --token xxxxx \ --ca-checksum xxxxx \ --etcd --controlplane # If using self-signed certificate, add --insecure flag: curl --insecure -fL https://rancher-server-ip/system-agent-install.sh | sudo sh -s - \ --server https://rancher-server-ip \ --label 'cattle.io/os=linux' \ --token xxxxx \ --ca-checksum xxxxx \ --etcd --controlplane -
Repeat for all control plane nodes (1 for dev, 3 for HA production)
-
Wait for nodes to appear in Rancher UI with status "Active"
Configure Worker Nodes
- In Rancher UI, check the boxes:
- ⬜ etcd (leave unchecked)
- ⬜ Control Plane (leave unchecked)
-
✅ Worker
-
Copy the new registration command
-
SSH to your worker server(s) and run the command:
# Example command (yours will be different): curl -fL https://rancher-server-ip/system-agent-install.sh | sudo sh -s - \ --server https://rancher-server-ip \ --label 'cattle.io/os=linux' \ --token xxxxx \ --ca-checksum xxxxx \ --worker # If using self-signed certificate, add --insecure flag: curl --insecure -fL https://rancher-server-ip/system-agent-install.sh | sudo sh -s - \ --server https://rancher-server-ip \ --label 'cattle.io/os=linux' \ --token xxxxx \ --ca-checksum xxxxx \ --worker -
Repeat for all worker nodes (3+ nodes minimum)
-
Wait for all nodes to become "Active"
Deployment Strategy B: Combined Roles (For Smaller Deployments)
This approach combines all roles on the same nodes to reduce server count.
Node Configuration: - Combined Role Nodes: etcd + Control Plane + Worker (all three roles) - etcd Nodes: Must be odd number (3, 5, or 7) for quorum - Total Nodes: Variable - you can add worker-only nodes to combined role nodes
Note: Only the nodes with etcd role need to be an odd number. You can mix: - 3 nodes with etcd + control plane + worker (required minimum) - Additional worker-only nodes as needed (any number)
Benefits: - Fewer servers required (3 nodes instead of 4-8) - Lower infrastructure costs - Simpler for development/testing or small production deployments
Considerations: - Control plane and application workloads share resources - Less isolation than dedicated roles - Still provides high availability with 3+ nodes
Configure Combined Role Nodes
- In Rancher UI, check ALL boxes:
- ✅ etcd
- ✅ Control Plane
-
✅ Worker
-
Copy the registration command
-
SSH to each server and run the command:
# Example command (yours will be different): curl -fL https://rancher-server-ip/system-agent-install.sh | sudo sh -s - \ --server https://rancher-server-ip \ --label 'cattle.io/os=linux' \ --token xxxxx \ --ca-checksum xxxxx \ --etcd --controlplane --worker # If using self-signed certificate, add --insecure flag: curl --insecure -fL https://rancher-server-ip/system-agent-install.sh | sudo sh -s - \ --server https://rancher-server-ip \ --label 'cattle.io/os=linux' \ --token xxxxx \ --ca-checksum xxxxx \ --etcd --controlplane --worker -
Repeat for combined role nodes (must be odd number: 3, 5, or 7 nodes with etcd)
-
(Optional) Add dedicated worker nodes: If you need more capacity, you can add worker-only nodes following the steps in Strategy A.
-
Wait for all nodes to become "Active"
Important: Each node with Worker role must have storage disks (e.g., /dev/sdb) for Ceph/Rook.
Example Hybrid Configuration: - 3 nodes: etcd + control plane + worker (with storage disks) - 2 nodes: worker only (with storage disks) - Total: 5 nodes (3 etcd nodes for quorum + 2 additional workers for capacity)
Which Strategy to Choose?
| Scenario | Recommended Strategy | Node Count | Configuration |
|---|---|---|---|
| Development/Testing | Combined Roles | 3 nodes | 3 nodes (etcd + control + worker) |
| Small Production | Combined Roles | 3-5 nodes | 3 nodes (etcd + control + worker) + 0-2 workers |
| Medium Production | Hybrid or Dedicated | 5-8 nodes | 3 nodes (etcd + control + worker) + 2-5 workers OR 1-3 control + 4-5 workers |
| Large Production (HA) | Dedicated Roles | 8+ nodes | 3 control + 5+ workers |
| Enterprise/High-Traffic | Dedicated Roles | 10+ nodes | 3 control + 7+ workers |
Key Points: - etcd nodes: Always odd number (3, 5, or 7) for quorum - Worker nodes: Can be any number - Hybrid: Combine strategies - etcd+control+worker nodes PLUS dedicated workers
Step 4: Verify Cluster Status
Check Cluster in Rancher UI
-
Navigate to Cluster Management → Clusters
-
Your cluster should show:
- State: Active
- Provider: Custom
-
Nodes: All nodes showing as "Active"
-
Click on your cluster name to view details
-
Verify Machines Tab:
- Navigate to Cluster Management → Clusters → Your Cluster Name
- Click on the Machines tab
- Important: When every node under the Machines tab shows status "Running", the initial cluster creation has completed successfully
- All machines should display:
- State: Running
- Node: Node name (e.g., control-node-1, worker-node-1)
- Roles: Assigned roles (etcd, controlplane, worker)
Verify with kubectl
- Download kubeconfig from Rancher:
- Click your cluster name
- Click "Download KubeConfig" button
-
Save the file (e.g.,
kubeconfig-lyra.yaml) -
Set KUBECONFIG environment variable:
-
Verify cluster connectivity:
Expected output:
- Check node status:
Expected output:
NAME STATUS ROLES AGE VERSION
control-node-1 Ready controlplane,etcd 5m v1.27.x
control-node-2 Ready controlplane,etcd 5m v1.27.x
control-node-3 Ready controlplane,etcd 5m v1.27.x
worker-node-1 Ready worker 5m v1.27.x
worker-node-2 Ready worker 5m v1.27.x
worker-node-3 Ready worker 5m v1.27.x
All nodes should show Ready status.
Step 5: Configure Rancher for Deployments
After cluster creation, configure Rancher settings required for application deployments.
Create Lyra Project in Rancher
IMPORTANT: Rancher Projects provide organizational structure and resource isolation for related applications. Create a dedicated project for the Lyra Platform and its components.
- Navigate to Projects/Namespaces:
- Click your cluster name in Rancher
-
Go to Projects/Namespaces in the left sidebar
-
Create New Project:
- Click Create Project button
- Project Name:
Lyra Platform - Description:
Lyra application and infrastructure components - Resource Quotas: (Optional) Set limits for the project
-
Container Default Resource Limit: (Optional) Set default limits
-
Click Create
What will be deployed in this project: - Lyra Backend application - Lyra Frontend application - Lyra Scheduler service - PostgreSQL database - Redis cache - Supporting infrastructure services
Benefits of using a dedicated project: - Logical grouping of all Lyra-related deployments - Resource quota management for the entire platform - Simplified RBAC (Role-Based Access Control) - Clear separation from other applications - Easier monitoring and troubleshooting
Add Lyra Helm Chart Repository
REQUIRED: Add the Lyra OCI Helm chart repository to the project to enable deployment of Lyra applications.
-
Go to Lyra Platform cluster and navigate to Apps → Repositories
-
Click Create and configure the repository:
- Name:
lyra-charts - Select OCI Repository
- Index URL:
oci://registry.lyra.ovh/lyra-charts - Authentication: Create an HTTP Basic Auth Secret
- Username: Your Harbor username
- Password: Your Harbor password/token
- Click Create to save
Verification: - The repository should appear in the list with status "Active" - This enables deployment of Lyra Helm charts directly through Rancher UI
Configure Container Registry Access
CRITICAL: Create a project-level registry secret to allow all deployments in the Lyra Platform project to pull images from Harbor.
-
Navigate to your cluster in Rancher and go to Storage → Project Secrets
-
Click Create and configure the registry secret:
- Type: Select Registry
- Project: Select Lyra Platform (the project you created earlier)
- Name:
harbor-registry-secret(must use this exact name) - Registry Domain Name:
registry.lyra.ovh - Username: Your Harbor username (from Prerequisites)
- Password: Your Harbor password/token
- Click Create to save
Why project-level secret: - Automatically available to all namespaces created within the Lyra Platform project - Deployments will automatically create their namespaces and inherit this secret - No need to manually create the secret in each namespace - Simplifies Helm chart deployments
Important: The secret must be named exactly harbor-registry-secret as Lyra Helm charts reference this name.
Note: Namespaces within the Lyra Platform project will be created automatically by Helm chart deployments in the next installation steps.
Step 6: Deploy Lyra Applications and Services
With the cluster configured and Rancher settings in place, you can now deploy Lyra Platform applications and services using Helm charts.
Deployment Overview
Lyra Platform consists of multiple components that need to be deployed in order:
- Infrastructure Services (PostgreSQL, Redis, Storage and more)
- Lyra Core Applications (Backend, Frontend, Scheduler)
All deployments are managed through Rancher's Apps & Marketplace interface using the Helm charts from your Harbor registry.
Deploy via Rancher UI
- Select lyra-charts repository:
- Go to your Lyra Platform cluster in Rancher and navigate to Apps → Charts
-
This will give you access to all Lyra infrastructure and application charts
-
Install desired chart:
- Browse available charts (lyra-app, postgresql, redis, etc.)
- Click on the chart you want to install
- Click Install
-
Chart Version: Select the desired version (e.g.,
1.0.0) -
Configure Deployment:
- Namespace: The Helm charts define the namespace automatically
- Names: Names are also defined by the Helm chart
- Project: Select Lyra Platform from the dropdown (the project you created earlier)
-
Helm Values: Each chart includes predefined values that can be customized through Rancher's configuration forms. All values are already configured to work out of the box, but you can adjust them to match your specific requirements.
-
Click Install and wait for deployment to complete
Next Steps
✅ Kubernetes Cluster Configured!
Your Kubernetes cluster is now configured with: - ✅ Rancher management integration - ✅ Harbor registry integration - ✅ Lyra Platform project created - ✅ Ready for infrastructure deployment
Proceed to: Infrastructure Deployment
The Infrastructure Deployment guide covers: - Step 1: Install Ceph/Rook Storage - Step 2: Deploy PostgreSQL Database - Step 3: Deploy Redis Cache - Step 4: Deploy CSI Drivers for External Storage - Step 5: Deploy MetalLB Load Balancer
Optional: Control Plane as Worker Configuration
If you chose the "Control Plane as Worker" deployment model (see Prerequisites), configure it now:
Remove Taints from Control Plane Nodes
# List control plane nodes
kubectl get nodes -l node-role.kubernetes.io/controlplane=true
# Remove taints from each control plane node
kubectl taint nodes <control-plane-node-name> node-role.kubernetes.io/control-plane:NoSchedule-
kubectl taint nodes <control-plane-node-name> node-role.kubernetes.io/master:NoSchedule-
# Repeat for all control plane nodes
Add Storage Disks to Control Plane Nodes
If using control plane as workers, ensure each control plane node also has storage disks:
- SSH to each control plane node
- Verify storage disks with
lsblk - Ensure
/dev/sdb(and optionally more) are available and unformatted
Ceph will automatically detect and use these disks since we configured useAllNodes: true.
Troubleshooting
Nodes Not Appearing in Rancher
Problem: Node doesn't show up after running registration command
Solutions:
# Check if rancher-agent container is running
sudo docker ps | grep rancher-agent
# Check rancher-agent logs
sudo docker logs <container-id>
# Common issues:
# - Firewall blocking connection to Rancher server
# - Incorrect Rancher server URL
# - Network connectivity issues
Ceph OSD Pods Not Starting
Problem: rook-ceph-osd-* pods stuck in pending or error state
Solutions:
# Check Rook operator logs
kubectl logs -n rook-ceph deployment/rook-ceph-operator
# Check if disks are being detected
kubectl get pods -n rook-ceph -l app=rook-ceph-osd-prepare
# Common issues:
# - Disks are already formatted (must be raw)
# - deviceFilter doesn't match your disk names
# - Not enough disks available
Storage Class Not Working
Problem: PVC stuck in "Pending" status
Solutions:
# Describe the PVC to see error
kubectl describe pvc <pvc-name>
# Check Ceph cluster status
kubectl get cephcluster -n rook-ceph
# Check Ceph health
kubectl exec -n rook-ceph deployment/rook-ceph-tools -- ceph status
# Common issues:
# - Ceph cluster not healthy
# - Insufficient OSDs
# - Storage class misconfiguration
Need assistance? Contact Lyra support or open an issue