Challenges running OpenSearch on Kubernetes
There are various challenges when running OpenSearch on Kubernetes.
Data disk challenges
Originally, Kubernetes was designed to run stateless applications. Running apps that need persistent storage requires additional orchestration from Kubernetes. When data nodes have attached to a disk and are writing data to it, they can affect the pod that they are on. If the data nodes disconnect, restart or are replaced, the pod they are on will be terminated. Once the pod is terminated, a new one will be created automatically. Now the disk that was attached to the old pod needs to be disconnected and reattached to the new one, which in many cases is on a different Kubernetes node.
The fact that you can declare different disks in Kubernetes adds to the complexity of this procedure. There are two types of disks: ephemeral (attached to the node) or network based. Each of these disks requires different treatment. The open-source Operator handles the attachment and detachment of nodes in a fully automatic process.
Challenges operating different node groups
The official HELM chart is for a single group of nodes sharing the same set of roles. If you’d like to control the master nodes, data nodes and other node types separately, you would need to manage multiple HELM charts and operate them separately. This is solved by the Operator which takes care of all node groups. You can easily customize the roles of each group (see below for a more in-depth explanation).
Scaling challenges
When scaling down data nodes, it’s recommended to plan in advance and strategize the process in order to avoid temporary unavailability of the data and service. It’s highly recommended to drain the data nodes before terminating them. When using HELM charts, you would have to reduce the number of replicas in the deployment which would terminate the relevant pods without ensuring the data is copied to other nodes first. The Operator allows for node drainage automatically before scaling down.
Security challenges
OpenSearch allows encryption in multiple layers of the stack: client communication, node-to-node transport, and Kibana connection encryption. All of these encryptions need to be prepared by creating certificates, storing them in Kubernetes and wiring them to the relevant components in OpenSearch using cert-manager. This is time-consuming and requires both attention and expertise. The open-source Operator will generate all the certificates and configure & install them automatically, out-of-the-box.
Advantages of using the OSS Operator for OpenSearch on K8 + Features (Kibana)
Automatic disk management
The first advantage is that disks are managed automatically. The Operator supports both network-based and attached disks and can operate both in cases when a node is restarted, removed, or a new one is added. When scaling down a node, the Operator will drain its disk before terminating it, ensuring the data is reallocated to other nodes first.
Configuration of all node groups and roles
The open-source Operator takes care of all node groups and you can customize the roles of each group easily. This can range from a single group of nodes having all roles, to a dedicated group of nodes for each role, like dedicated master nodes, dedicated coordinators, machine learning and of course, dedicated data nodes. The Operator allows scaling of each of these groups independently and configuration of hardware requirements for each node in the group.
Advanced security options
The Operator allows different levels of security based on the configuration. It can be as simple as user-password authentication, or as complex as client communication encryption, auto-generated or created by providing the user certificate authority. The same goes for node-to-node transport and Kibana communication which can be configured with self-signed certificates or user-provided.
Easy installation of OpenSearch Dashboards
The Operator allows you to install OpenSearch Dashboards using simple configuration. You can specify how many replicas of the Dashboards you’d like to have, and their security settings.
Automatic version upgrades
If you’d like to upgrade OpenSearch versions, the Operator can automate this process by performing rolling upgrades, node by node. Note that before upgrading versions using the Operator, you still need to take care of any breaking changes using APIs and the data format you store in OS.
Easy adjustment of nodes’ memory allocation and limits
Changing the nodes’ hardware in terms of allocated memory and disk size is very simple, as all you need to do is change the configuration and store it and the Operator will pick up this config change and automate the entire process of blue-green deployments replacing all nodes with the new nodes containing the requested hardware.
Autoscaling
One of the most requested capabilities is the ability to autoscale the cluster based on the workload running in the cluster. The OSS Operator will allow you to scale up the cluster when the nodes are too loaded, often during peak-times, and scale back down when the peak has passed. This way you can ensure users have the best performance, but also that costs remain minimal, avoiding unnecessary hardware.
Installation instructions
The open-source Operator can be easily installed using Helm.
1. Add the Helm repo:
helm repo add opensearch-operator https://opster.github.io/opensearch-k8s-operator/
2. Install the Operator:
helm install opensearch-operator opensearch-operator/opensearch-operator
Quickstart
After you have successfully installed the Operator, you can deploy your first OpenSearch cluster. This is done by creating an OpenSearchCluster custom object in Kubernetes.
Create a file cluster.yaml with the following content:
apiVersion: opensearch.opster.io/v1 kind: OpenSearchCluster metadata: name: my-first-cluster namespace: default spec: general: serviceName: my-first-cluster version: 1.3.1 dashboards: enable: true version: 1.3.1 replicas: 1 resources: requests: memory: "512Mi" cpu: "200m" limits: memory: "512Mi" cpu: "200m" nodePools: - component: masters replicas: 3 diskSize: "5Gi" NodeSelector: resources: requests: memory: "2Gi" cpu: "500m" limits: memory: "2Gi" cpu: "500m" roles: - "data" - "master"
Then run:
kubectl apply -f cluster.yaml
If you watch the cluster, you will see that after a few seconds the Operator will create several pods. For example:
watch -n 2 kubectl get pods
First, a bootstrap pod will be created that helps with initial master discovery.
my-first-cluster-bootstrap-0
Then three pods for the OpenSearch cluster will be created, and one pod for the dashboards instance.
my-first-cluster-masters-0/1/2
After the pods are appearing as ready, which normally takes about 1-2 minutes, you can connect to your cluster using port-forwarding.
Run:
kubectl port-forward svc/my-first-cluster-dashboards 5601
Then open http://localhost:5601 in your browser and log in with the default demo credentials admin / admin
. Alternatively, if you want to access the OpenSearch REST API, run:
kubectl port-forward svc/my-first-cluster 9200
Then open a second terminal and run:
curl -k -u admin:admin https://localhost:9200/_cat/nodes?v
You should see the three deployed pods listed.
If you’d like to delete your cluster, run:
kubectl delete -f cluster.yaml
The Operator will then clean up and delete any Kubernetes resources created for the cluster. Note that this will not delete the persistent volumes for the cluster, in most cases. For a complete cleanup, including deleting the PVCs, run:
kubectl delete pvc -l opster.io/opensearch-cluster=my-first-cluster
The minimal cluster you deployed in this section is only intended for demo purposes. Please see the next sections on how to configure the different aspects of your cluster.
Configuration
You can configure data persistence, OpenSearch Dashboards, TLS, security, node groups and more. Follow the instructions here: https://github.com/Opster/opensearch-k8s-operator/blob/main/docs/userguide/main.md#data-persistence
One of the main sections you can configure is the nodepools. Each nodepool represents a group of nodes with the same roles. Configuring multiple nodepools will allow you to define dedicated masters, coordinators only, dedicated ML, and any other combination you’d like. Each nodepool will allow you to define the resources needed for that group of nodes.
Using the Operator to operate OS
The Operator enables you to operate OpenSearch clusters easily with high-level APIs. The APIs allow operations such as:
- Scaling node groups (up and down), including draining data nodes prior to scaling down
- Upgrading OpenSearch versions in a single API call
- Changing node configurations and performing Rolling Restarts
- Changing nodes’ hardware – memory, disk size, etc using blue-green deployment
- And more
Most of these operations are done by changing the configuration in the CRDs of the clusters. Once the CRDs have changed, the Operator will pick up the updates and will implement and reconcile the changes automatically.
Notes regarding compatibility & roadmap
The Operator is compatible with OpenSearch versions 1.x and 2.x. It is also compatible with the main Kubernetes distributions. Support for OpenShift is coming soon.
In terms of the roadmap, we have exciting features coming soon. Firstly, the Autoscaler, which is based on usage, load, and resources. We are also working on controlling shard balancing and allocation: AZ/Rack awareness, Hot/Warm, and there are many more features coming soon,