Briefly, this error occurs when Elasticsearch encounters issues during the reindexing process. This could be due to insufficient memory, incorrect mappings, or document conflicts. To resolve this, you can increase the heap size to provide more memory, ensure that the mappings are correct before reindexing, or handle document conflicts by using the correct versioning strategy. Additionally, check for any network connectivity issues or disk space constraints that might be causing the bulk failures.
This guide will help you check for common problems that cause the log ” Encountered bulk failures during reindex process ” to appear. To understand the issues related to this log, read the explanation below about the following Elasticsearch concepts: plugin, bulk, reindex.
Overview
In Elasticsearch, when using the Bulk API it is possible to perform many write operations in a single API call, which increases the indexing speed. Using the Bulk API is more efficient than sending multiple separate requests. This can be done for the following four actions:
- Index
- Update
- Create
- Delete
Examples
The bulk request below will index a document, delete another document, and update an existing document.
POST _bulk { "index" : { "_index" : "myindex", "_id" : "1" } } { "field1" : "value" } { "delete" : { "_index" : "myindex", "_id" : "2" } } { "update" : {"_id" : "1", "_index" : "myindex"} } { "doc" : {"field2" : "value5"} }
Notes
- Bulk API is useful when you need to index data streams that can be queued up and indexed in batches of hundreds or thousands, such as logs.
- There is no correct number of actions or limits to perform on a single bulk call, but you will need to figure out the optimum number by experimentation, given the cluster size, number of nodes, hardware specs etc.
Overview
Reindex is the concept of copying existing data from a source index to a destination index which can be inside the same or a different cluster. Elasticsearch has a dedicated endpoint _reindex for this purpose. A reindexing is mostly required for updating mapping or settings.
Examples
Reindex data from a source index to destination index in the same cluster:
POST /_reindex?pretty { "source": { "index": "news" }, "dest": { "index": "news_v2" } }
Notes
- Reindex API does not copy settings and mappings from the source index to the destination index. You need to create the destination index with the desired settings and mappings before you begin the reindexing process.
- The API exposes an extensive list of configuration options to fetch data from the source index, such as query-based indexing and selecting multiple indices as the source index.
- In some scenarios reindex API is not useful, where reindexing requires complex data processing and data modification based on application logic. In this case, you can write your custom script using Elasticsearch scroll API to fetch the data from source index and bulk API to index data into destination index.
Log Context
Log “Encountered bulk failures during reindex process” class name is EnrichPolicyRunner.java. We extracted the following from Elasticsearch source code for those seeking an in-depth context :
); failure.getCause() ); } } delegate.onFailure(new ElasticsearchException("Encountered bulk failures during reindex process")); } else if (bulkByScrollResponse.getSearchFailures().size() > 0) { logger.warn( "Policy [{}]: encountered [{}] search failures. Turn on DEBUG logging for details."; policyName; bulkByScrollResponse.getSearchFailures().size()