Briefly, this error occurs when the Elasticsearch timer thread sleeps for a duration that exceeds the set warning threshold. This could be due to high system load, insufficient resources, or garbage collection pauses. To resolve this, you can increase the system resources (CPU, memory), optimize your queries and indices to reduce load, or adjust the JVM settings to minimize garbage collection pauses. Additionally, ensure your Elasticsearch version is up-to-date as some versions have known issues with timer threads.
This guide will help you check for common problems that cause the log ” timer thread slept for [{}/{}ms] on absolute clock which is above the warn threshold of [{}ms] ” to appear. To understand the issues related to this log, read the explanation below about the following Elasticsearch concepts: threshold, threadpool, thread.
Overview
Elasticsearch uses several parameters to enable it to manage hard disk storage across the cluster.
What it’s used for
- Elasticsearch will actively try to relocate shards away from nodes which exceed the disk watermark high threshold.
- Elasticsearch will NOT locate new shards or relocate shards on to nodes which exceed the disk watermark low threshold.
- Elasticsearch will prevent all writes to an index which has any shard on a node that exceeds the disk.watermark.flood_stage threshold.
- The info update interval is the time it will take Elasticsearch to re-check the disk usage.
Examples
PUT _cluster/settings { "transient": { "cluster.routing.allocation.disk.watermark.low": "85%", "cluster.routing.allocation.disk.watermark.high": "90%", "cluster.routing.allocation.disk.watermark.flood_stage": "95%", "cluster.info.update.interval": "1m" } }
Notes and good things to know
- You can use absolute values (100gb) or percentages (90%), but you cannot mix the two on the same cluster.
- In general, it is recommended to use percentages, since this will work in case the disks are resized.
- You can put the cluster settings on the elasticsearch.yml of each node, but it is recommended to use the PUT _cluster/settings API because it is easier to manage, and ensures that the settings are coherent across the cluster.
- Elasticsearch comes with sensible defaults for these settings, so think twice before modifying them. If you find you are spending a lot of time fine-tuning these settings, then it is probably time to invest in new disk space.
- In the event of the flood_stage threshold being exceeded, once you delete data, Elasticsearch should detect automatically that the block can be released (bearing in mind the update interval which could be, for instance, a minute). However if you want to accelerate this process, you can unblock an index manually, with the following call:
PUT /my_index/_settings { "index.blocks.read_only_allow_delete": null }
Common problems
Inappropriate cluster settings (if the disk watermark.low is too low) can make it impossible for Elasticsearch to allocate shards on the cluster. In particular, bear in mind that these parameters work in combination with other cluster settings (for example shard allocation awareness) which cause further restraints on how Elasticsearch can allocate shards.
Overview
Elasticsearch uses threadpools to manage how requests are processed and to optimize the use of resources on each node in the cluster.
What it’s used for
The main threadpools are for search, get and write, but there are a number of others which you can see by running:
GET /_cat/thread_pool/?v&h=id,name,active,rejected,completed,size,type&pretty
You can see by running the above command that each node has a number of different thread pools, what the size and type of the thread pool are, and you can see which nodes have rejected operations. Elasticsearch automatically configures the threadpool management parameters based on the number of processors detected in each node.
Threadpool types
Fixed- a fixed number of threads, with a fixed queue size
thread_pool: write: size: 30 queue_size: 1000
Scaling- a variable number of threads that Elasticsearch scales automatically according to workload.
thread_pool: warmer: core: 1 max: 8
fixed_autoqueue_size- a fixed number of threads with a variable queue size which changes dynamically in order to maintain a target response time
thread_pool: search: size: 30 queue_size: 500 min_queue_size: 10 max_queue_size: 1000 auto_queue_frame_size: 2000 target_response_time: 1s
Examples
To see which threads are using the highest CPU or taking the longest time you can use the following query. This may help find operations that are causing your cluster to underperform.
GET /_nodes/hot_threads
Notes and good things to know
In general it is not recommended to tweak threadpool settings. However, it is worth noting that the threadpools are set based upon the number of processors that Elasticsearch has detected on the underlying hardware. If that detection fails, then you should explicitly set the number of processors available in your hardware in elasticsearch.yml like this:
processors: 4
Most threadpools also have queues associated with them to enable Elasticsearch to store requests in memory while waiting for resources to become available to process the request. However the queues are usually of a finite size, and if that size becomes exceeded, then Elasticsearch will reject the request.
Sometimes you may be tempted to increase the queue size to prevent requests being rejected, but this will only treat the symptom and not the underlying cause of the problem. Indeed, it may even be counter productive, since by allowing a larger queue size, the node will need to use more memory to store the queue, and will have less space to actually manage requests. Furthermore increasing the queue size will also increase the length of time that operations are kept in the queue, resulting in client applications facing time out issues.
Usually, the only case where it can be justified to increase the queue size is where requests are received in uneven surges and you are unable to manage this process client-side.
You can monitor thread pools to better understand the performance of your Elasticsearch cluster. The Elasticsearch monitoring panel in Kibana shows your graphs of the search, get, and write thread queues and any queue rejections. Growing queues indicate that Elasticsearch is having difficulty keeping up with requests, and rejections indicate that queues have grown to the point that Elasticsearch rejects calls to the server.
Check the underlying causes of increases in queues. Try to balance activity across the nodes in the cluster and try to balance the demands on the cluster thread pool by taking actions on the client-side.
Log Context
Log “timer thread slept for [{}/{}ms] on absolute clock which is above the warn threshold of [{}ms]” classname is ThreadPool.java.
We extracted the following from Elasticsearch source code for those seeking an in-depth context :
try { final long deltaMillis = newAbsoluteMillis - absoluteMillis; if (deltaMillis > thresholdMillis) { final TimeValue delta = TimeValue.timeValueMillis(deltaMillis); logger.warn("timer thread slept for [{}/{}ms] on absolute clock which is above the warn threshold of [{}ms]"; delta; deltaMillis; thresholdMillis); } else if (deltaMillis