Briefly, this error occurs when Elasticsearch encounters an issue while trying to fail a replica shard. This could be due to network issues, disk space problems, or a bug in Elasticsearch. To resolve this issue, you can try the following: 1) Check the health of your network and ensure all nodes are reachable. 2) Verify that there is sufficient disk space on all nodes. 3) Upgrade Elasticsearch to the latest version to fix any potential bugs. 4) Reallocate the replica to another node using the cluster reroute API.
This guide will help you check for common problems that cause the log ” {} unexpected error while failing replica ” to appear. To understand the issues related to this log, read the explanation below about the following Elasticsearch concepts: replication.
Overview
Replication refers to storing a redundant copy of the data. Starting from version 7.x, Elasticsearch creates one primary shard with a replication factor set to 1. Replicas never get assigned to the same node on which primary shards are assigned, which means you should have at least two nodes in the cluster to assign the replicas. If a primary shard goes down, the replica automatically acts as a primary shard.
What it is used for
Replicas are used to provide high availability and failover. A higher number of replicas is also helpful for faster searches.
Examples
Update replica count
PUT /api-logs/_settings?pretty { "index" : { "number_of_replicas" : 2 } }
Common problems
- By default, new replicas are not assigned to nodes with more than 85% disk usage. Instead, Elasticsearch throws a warning.
- Creating too many replicas may cause a problem if there are not enough resources available in the cluster.
Log Context
Log “{} unexpected error while failing replica” classname is TransportReplicationAction.java.
We extracted the following from Elasticsearch source code for those seeking an in-depth context :
}); } else { try { failReplicaIfNeeded(t); } catch (Throwable unexpected) { logger.error("{} unexpected error while failing replica"; unexpected; request.shardId().id()); } finally { responseWithFailure(t); } } }