Briefly, this error occurs when Elasticsearch is trying to trigger the Garbage First Garbage Collector (G1GC) due to high heap usage. This means that the JVM heap space is almost full, which can lead to performance issues or even crashes. To resolve this issue, you can increase the heap size if your server has enough memory. Alternatively, you can optimize your queries or indices to reduce memory usage. Also, consider deleting unnecessary data or indices. Lastly, ensure that your Elasticsearch nodes are properly distributed and balanced to prevent memory overload on a single node.
This guide will help you check for common problems that cause the log ” attempting to trigger G1GC due to high heap usage [{}] ” to appear. To understand the issues related to this log, read the explanation below about the following Elasticsearch concepts: indices, breaker.
Overview
Elasticsearch has the concept of circuit breakers to deal with OutOfMemory errors that cause nodes to crash. When a request reaches Elasticsearch nodes, the circuit breakers first estimate the amount of memory needed to load the required data. They then compare the estimated size with the configured heap size limit. If the estimated size is greater than the heap size, the query is terminated and an exception is thrown to avoid the node loading more than the available heap size.
What they are used for
Elasticsearch has several circuit breakers available such as fielddata, requests, network, indices and script compilation. Each breaker is used to limit the memory an operation can use. In addition, Elasticsearch has a parent circuit breaker which is used to limit the combined memory used by all the other circuit breakers.
Examples
Increasing circuit breaker size for fielddata limit – The default limit for fielddata breakers is 40%. The following command can be used to increase it to 60%:
PUT /_cluster/settings { "persistent": { "indices.breaker.fielddata.limit": "60%" } }
Notes
- Each breaker ships with default limits and their limits can be modified as well. But this is an expert level setting and you should understand the pitfalls carefully before changing the limits, otherwise the node may start throwing OOM exceptions.
- Sometimes it is better to fail a query instead of getting an OOM exception, because when OOM appears JVM becomes unresponsive.
- It is important to keep indices.breaker.request.limit lower than indices.breaker.total.limit so that request circuit breakers trip before the total circuit breaker.
Common problems
- The most common error resulting from circuit breakers is “data too large” with 429 status code. The application should be ready to handle such exceptions.
- If the application starts throwing exceptions because of circuit breaker limits, it is important to review the queries and memory requirements. In most cases, a scaling is required by adding more resources to the cluster.
Log Context
Log “attempting to trigger G1GC due to high heap usage [{}]” classname is HierarchyCircuitBreakerService.java.
We extracted the following from Elasticsearch source code for those seeking an in-depth context :
long begin = timeSupplier.getAsLong(); leader = begin >= lastCheckTime + minimumInterval; overLimitTriggered(leader); if (leader) { long initialCollectionCount = gcCountSupplier.getAsLong(); logger.info("attempting to trigger G1GC due to high heap usage [{}]"; memoryUsed.baseUsage); long localBlackHole = 0; // number of allocations; corresponding to (approximately) number of free regions + 1 int allocationCount = Math.toIntExact((maxHeap - memoryUsed.baseUsage) / g1RegionSize + 1); // allocations of half-region size becomes single humongous alloc; thus taking up a full region. int allocationSize = (int) (g1RegionSize >> 1);