Overview

Setting up an Elasticsearch cluster involves several critical steps and configurations that need to be meticulously followed to ensure optimal performance and reliability. This article will delve into the advanced aspects of setting up an Elasticsearch cluster. If you want to learn the basics about Elasticsearch settings, check out this guide.

How to set up an Elasticsearch cluster

Steps to set up an Elasticsearch cluster (details below):

Planning the Cluster
Configuring the Nodes
Setting Up Discovery
Configuring Shard Allocation
Starting the Nodes
Verifying the Cluster Setup
Tuning the Cluster

Step 1: Planning the Cluster

Before setting up the cluster, it’s crucial to plan the cluster’s size and structure based on the data volume, query load, and redundancy requirements. The cluster should have at least one master-eligible node (three are recommended for production clusters), and multiple data nodes for storing data. The number of data nodes can be scaled up based on the data volume and query load.

Step 2: Configuring the Nodes

Each node in the cluster needs to be configured by setting various parameters in the elasticsearch.yml configuration file. The ‘node.name’ parameter should be set to a unique name for each node. The ‘cluster.name’ parameter should be set to the same value for all nodes in the cluster. The ‘network.host’ parameter should be set to the IP address or hostname of the node.

The ‘node.roles parameter should be set based on the role of the node. For a master-eligible node, ‘master’ should be added to the roles array. For data nodes, There are multiple different roles depending on which data tier the node is located in: `data_hot`, `data_warm`, `data_cold` and `data_frozen`. If you are not using data tiers or if your data is not based on time-series, you can use the `data` or `data_content` roles.

Here is a sample configuration file summarizing all the above points:

# cluster name
cluster.name: my_cluster_name

# node name and roles
node.name: my_node_1
node.roles: [master, data]

# IP or host name of the node
network.host: 192.168.1.10

Step 3: Setting Up Discovery

If your nodes are installed on multiple hosts with different hostnames and IPs, you also need to configure network discovery settings, so that your nodes can connect to each other, otherwise they will only try to connect to other nodes installed on the same host. This can be done with the `discovery.seed_hosts` parameter that must contain the list of all master-eligible nodes in your cluster, so that your node can be instructed to which other nodes it can connect.

The first time you bootstrap your cluster, another crucial step is very important, namely to define the list of all master-eligible nodes of your cluster. This needs to be specified ONLY ONCE when you start your production cluster for the first time using the `cluster.initial_master_nodes` parameter, which must contain the node names of all of your master-eligible nodes (i.e., the exact same name as configured in their `node.name` configuration parameter). After the cluster has been successfully bootstrapped, this setting MUST be removed from the configuration file.

The following configuration parameters should be added to the elasticsearch.yml configuration file in order to configure network discovery settings:

# list of all master-eligible nodes for node discovery
discovery.seed_hosts:
   - 192.168.1.2 
   - 192.168.1.3
   - 192.168.1.4 

cluster.initial_master_nodes:
   - master_node_1
   - master_node_2
   - master_node_3

Step 4: Configuring Shard Allocation

Shard allocation is a critical aspect of Elasticsearch cluster setup. The ‘index.number_of_shards’ parameter should be set based on the data volume and query load. The ‘index.number_of_replicas’ parameter should be set based on the redundancy requirements.

These configuration parameters can be configured statically in the elasticsearch.yml configuration file to provide a default value for all indexes to be created, and they can also be overridden at index creation time by specifying them in the index settings.

You can add the following parameters in your elasticsearch.yml configuration to make sure that all indexes will be created with 2 primary shards and 1 replica for each primary shard:

index.number_of_shards: 2
index.number_of_replicas: 1

If, for some reason, you know beforehand that one of your indexes will grow bigger and needs to have more than 2 primary shards, you can configure that at index-creation time, like this:

PUT my-big-index
{
  "settings": {
    "index.number_of_shards": 5,
    "index.number_of_replicas": 2
  },
  "mappings": {
  …
  }
}

Step 5: Starting the Nodes

Once the nodes are configured, they can be started by running the ‘elasticsearch’ command in the bin directory of the Elasticsearch installation. The nodes will discover each other and form a cluster.

You should first start all master-eligible nodes, so that the cluster can be formed, and then you can start all the remaining data nodes.

Step 6: Verifying the Cluster Setup

The cluster setup can be verified by sending a GET request to the ‘_cluster/health’ endpoint. The response should show the status of the cluster as ‘green’, the number of nodes in the cluster, and the number of data and master-eligible nodes.

Step 7: Tuning the Cluster

After the cluster is set up, it should be tuned for optimal performance. The JVM heap size should be set to no more than 50% of the available RAM, but at most 30.5GB. The ‘indices.fielddata.cache.size’ parameter should be set to limit the amount of memory used for field data cache. By default, this setting is unbounded, but can be limited using either an absolute size (10GB) or a percentage value of the available heap (10%).

The ‘indices.breaker.total.limit’ parameter should be set to limit the amount of memory used by all circuit breakers. The ‘thread_pool.search.size’ and ‘thread_pool.search.queue_size’ parameters should be set based on the query load.

Conclusion

In conclusion, setting up an Elasticsearch cluster involves careful planning, meticulous configuration, and continuous tuning. By following these steps, one can set up a robust, high-performing, and reliable Elasticsearch cluster.