Briefly, this error occurs when Elasticsearch receives a response for a request that has already timed out. This could be due to network latency, overloaded server, or slow processing of the request. To resolve this issue, you can increase the timeout limit in Elasticsearch settings. Alternatively, you can optimize your queries to reduce processing time or upgrade your server to handle more requests. Also, check your network for any issues that might be causing delays in communication.
We recommend you run Elasticsearch Error Check-Up which can resolve issues that cause many errors.
Advanced users might want to skip right to the common problems section in each concept or try running the Check-Up which analyses ES to pinpoint the cause of many errors and provides suitable actionable recommendations how to resolve them (free tool that requires no installation).
Overview:Â
Elasticsearch search and index requests are executed within a timeout if specified, and if within the defined timeout threshold the request isn’t completed, then Elasticsearch sends the response calculated till that duration and in JSON response includes `timed_out: true`.
But the above time_out param in response is just an indicator that the request didn’t execute within a specified timeout, that it contained partial results and that the request is still running in the background. When the request completes its execution, then Elasticsearch logs this as a *warning* in the log format below.
Received response for a request that has timed out, sent [{}ms] ago, timed out [{}ms] ago
If there are many such logs in a regular interval it should be fixed to prevent the outages and cluster resource saturations.
Some tips on how to identify the cause of such a timeout can be found in How to use Slow Logs the Complete Guide. This link is specific to search slow logs, but the same can be used to filter the slow indexing logs quickly as well. Most of the time, the cause of these timeouts is slow-running expensive search queries.
Once the cause of such timeout is identified below, fixes can be applied to prevent timeouts and to make clusters perform at an optimal level.
- Fix the expensive search queries by implementing them in different ways; One example is, Autocomplete queries which are expensive, and different implementation w.r.t to their performance impact as mentioned in Auto-complete guide .
- If there is a huge number of documents in a single shard or fewer shards, then split them into additional shards on different physical servers and make the performance comparison between them. More info can be found here: Shards and Replicas getting started guide.
- This log is present for both Elasticsearch version 2.3 and version 6.8. If you are using the latest Elasticsearch version(7.X) then use https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html to identify the long-running tasks in Elaticsearch and kill them.Â
- Try including the timeout in more queries (with different values) and track the metrics (how many times they timeout) to get some idea of how they perform during peak times.
- For older versions of Elasticsearch like (1.X, 2.X) where task management API is not available, make use of terminate_after param in the search query to terminate long-running queries.
Overview
To put it simply, a node is a single server that is part of a cluster. Each node is assigned one or more roles, which describe the node’s responsibility and operations. Data nodes store the data, and participate in the cluster’s indexing and search capabilities, while master nodes are responsible for managing the cluster’s activities and storing the cluster state, including the metadata.
While it is possible to run several node instances of Elasticsearch on the same hardware, it’s considered a best practice to limit a server to a single running instance of Elasticsearch.
Nodes connect to each other and form a cluster by using a discovery method.
Roles
Master node
Master nodes are in charge of cluster-wide settings and changes – deleting or creating indices and fields, adding or removing nodes and allocating shards to nodes. Each cluster has a single master node that is elected from the master eligible nodes using a distributed consensus algorithm and is reelected if the current master node fails.
Coordinating (client) node
There is some confusion in the use of coordinating node terminology. Client nodes were removed from Elasticsearch after version 2.4 and became coordinating nodes.
Coordinating nodes are nodes that do not hold any configured role. They don’t hold data and are not part of the master eligible group nor execute ingest pipelines. Coordinating nodes serve incoming search requests and act as the query coordinator running query and fetch phases, sending requests to every node that holds a shard being queried. The coordinating node also distributes bulk indexing operations and route queries to shards based on the node’s responsiveness.
Log Context
Log “Received response for a request that has timed out; sent [{}ms] ago; timed out [{}ms] ago; action [{}]; node [{}]; id [{}]” classname is TransportService.java.
We extracted the following from Elasticsearch source code for those seeking an in-depth context :
final String action; assert clientHandlers.get(requestId) == null; TimeoutInfoHolder timeoutInfoHolder = timeoutInfoHandlers.remove(requestId); if (timeoutInfoHolder != null) { long time = System.currentTimeMillis(); logger.warn("Received response for a request that has timed out; sent [{}ms] ago; timed out [{}ms] ago; action [{}]; node [{}]; id [{}]"; time - timeoutInfoHolder.sentTime(); time - timeoutInfoHolder.timeoutTime(); timeoutInfoHolder.action(); timeoutInfoHolder.node(); requestId); action = timeoutInfoHolder.action(); sourceNode = timeoutInfoHolder.node(); } else { logger.warn("Transport response handler not found of id [{}]"; requestId); action = null;