Briefly, this error occurs when Elasticsearch encounters issues during the reindexing process. This could be due to insufficient memory, incorrect mappings, or network connectivity issues. To resolve this, you can increase the heap size to provide more memory, ensure that the mappings are correct before reindexing, or check your network connections. Additionally, you can also check the Elasticsearch logs for more specific error messages that can help identify the root cause of the problem.
This guide will help you check for common problems that cause the log ” Encountered search failures during reindex process ” to appear. To understand the issues related to this log, read the explanation below about the following Elasticsearch concepts: plugin, search, reindex.
Overview
Search refers to the searching of documents in an index or multiple indices. The simple search is just a GET API request to the _search endpoint. The search query can either be provided in query string or through a request body.
Examples
When looking for any documents in this index, if search parameters are not provided, every document is a hit and by default 10 hits will be returned.
GET my_documents/_search
A JSON object is returned in response to a search query. A 200 response code means the request was completed successfully.
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 1.0, "hits" : [ ... ] } }
Notes and good things to know
- Distributed search is challenging and every shard of the index needs to be searched for hits, and then those hits are combined into a single sorted list as a final result.
- There are two phases of search: the query phase and the fetch phase.
- In the query phase, the query is executed on each shard locally and top hits are returned to the coordinating node. The coordinating node merges the results and creates a global sorted list.
- In the fetch phase, the coordinating node brings the actual documents for those hit IDs and returns them to the requesting client.
- A coordinating node needs enough memory and CPU in order to handle the fetch phase.
Overview
Reindex is the concept of copying existing data from a source index to a destination index which can be inside the same or a different cluster. Elasticsearch has a dedicated endpoint _reindex for this purpose. A reindexing is mostly required for updating mapping or settings.
Examples
Reindex data from a source index to destination index in the same cluster:
POST /_reindex?pretty { "source": { "index": "news" }, "dest": { "index": "news_v2" } }
Notes
- Reindex API does not copy settings and mappings from the source index to the destination index. You need to create the destination index with the desired settings and mappings before you begin the reindexing process.
- The API exposes an extensive list of configuration options to fetch data from the source index, such as query-based indexing and selecting multiple indices as the source index.
- In some scenarios reindex API is not useful, where reindexing requires complex data processing and data modification based on application logic. In this case, you can write your custom script using Elasticsearch scroll API to fetch the data from source index and bulk API to index data into destination index.
Log Context
Log “Encountered search failures during reindex process” class name is EnrichPolicyRunner.java. We extracted the following from Elasticsearch source code for those seeking an in-depth context :
); failure.getReason() ); } } delegate.onFailure(new ElasticsearchException("Encountered search failures during reindex process")); } else { logger.info( "Policy [{}]: Transferred [{}] documents to enrich index [{}]"; policyName; bulkByScrollResponse.getCreated();