Briefly, this error occurs when Elasticsearch tries to sort a field that has multiple values. Elasticsearch can only sort on single-value fields. To resolve this, you can either change your data model to ensure the field you’re sorting on only has one value per document, or use a script to combine multiple field values into one for sorting purposes. Alternatively, you can use the max or min function to select a single value from the multiple values for sorting.
This guide will help you check for common problems that cause the log ” Encountered more than one sort value for a ” to appear. To understand the issues related to this log, read the explanation below about the following Elasticsearch concepts: plugin, sort, aggregations, search.
Introduction
Sorting is an essential aspect of Elasticsearch when it comes to presenting search results in a specific order. By default, Elasticsearch sorts the results based on the relevance score, which is calculated using the Lucene scoring formula. However, there are cases where you might want to sort the results based on other criteria, such as a specific field value or a custom sorting logic. In this article, we will explore advanced techniques and best practices for sorting in Elasticsearch.
Advanced techniques and best practices for sorting in Elasticsearch
1. Sorting by Field Values
To sort the search results based on a specific field value, you can use the “sort” parameter in your search query. For example, if you want to sort the results based on the “price” field in ascending order, you can use the following query:
GET /products/_search { "query": { "match_all": {} }, "sort": [ { "price": { "order": "asc" } } ] }
2. Sorting by Multiple Fields
You can also sort the search results based on multiple fields by specifying an array of sort objects. For example, if you want to sort the results first by “category” in ascending order and then by “price” in descending order, you can use the following query:
GET /products/_search { "query": { "match_all": {} }, "sort": [ { "category": { "order": "asc" } }, { "price": { "order": "desc" } } ] }
3. Sorting with Missing Values
In some cases, the documents in your index might not have a value for the field you want to sort by. By default, Elasticsearch treats these documents as having the lowest possible value for the field. However, you can control how Elasticsearch handles missing values by using the “missing” parameter. For example, if you want to treat documents with missing “price” values as having the highest possible price, you can use the following query:
GET /products/_search { "query": { "match_all": {} }, "sort": [ { "price": { "order": "asc", "missing": "_last" } } ] }
4. Sorting with Nested Fields
If you have nested fields in your documents, you can sort the search results based on the values of these fields using the “nested” parameter. For example, if you have a “reviews” nested field with a “rating” property, you can sort the products based on the average rating as follows:
GET /products/_search { "query": { "match_all": {} }, "sort": [ { "reviews.rating": { "order": "desc", "nested": { "path": "reviews" }, "mode": "avg" } } ] }
5. Custom Sorting with Script-Based Sorting
In some cases, you might want to apply custom sorting logic that cannot be achieved using the built-in sorting options. In such cases, you can use script-based sorting to define your custom sorting logic using Painless, Elasticsearch’s scripting language. For example, if you want to sort the products based on the difference between their regular price and discounted price, you can use the following query:
GET /products/_search { "query": { "match_all": {} }, "sort": [ { "_script": { "type": "number", "script": { "source": "doc['regular_price'].value - doc['discounted_price'].value" }, "order": "desc" } } ] }
Best Practices for Sorting in Elasticsearch
- Use Doc Values: When sorting by field values, make sure to use doc values, which are the on-disk data structure that Elasticsearch uses for sorting and aggregations. Doc values are enabled by default for most field types, but if not, you can explicitly enable them by setting the “doc_values” parameter to “true” in your field mapping.
- Avoid Sorting by Text Fields: Sorting by text fields can be slow and memory-intensive, as Elasticsearch needs to load the field data into memory. Instead, use keyword fields or other field types that support doc values for sorting.
- Use Index Sorting: If you have a fixed sorting order that you use frequently, you can improve the sorting performance by using index sorting. Index sorting sorts the documents during indexing, which can speed up the sorting process during search. However, keep in mind that index sorting can increase the indexing time and memory usage.
- Optimize Pagination: When using sorting with pagination, avoid using deep pagination, as it can be slow and memory-intensive. Instead, use the “search_after” parameter to paginate through the search results more efficiently.
Conclusion
By following these advanced techniques and best practices, you can optimize the sorting process in Elasticsearch and ensure that your search results are presented in the desired order.
Aggregations in Elasticsearch
Definition
The aggregations framework is a powerful tool built in every Elasticsearch deployment. In Elasticsearch, an aggregation is a collection or the gathering of related things together. The aggregation framework collects data based on the documents that match a search request which helps in building summaries of the data. With aggregations you can not only search your data, but also take it a step further and extract analytical information.
Aggregations are used all over the place in Kibana: dashboards, APM app, Machine Learning app and so on. Aggregations are also heavily used in common search use cases, such as an e-Commerce website. In those use cases search results usually come with a set of filters that take into account only the scope of the result set of your search. The user is then given the option to filter even further by, for example, product category, color, range of price and so on. Those filter options usually come with a metric indication to give the user an idea of, for example, how many items per category their search results contain.
This kind of feature is only possible by using the aggregations framework.
Other examples of uses of the aggregations framework include the following:
- Average load time of a website
- Most valuable customers based on transaction volume
- Histogram showing some metric (quantity, average, sum, …) for events occurred in dynamically generated time periods
- Quantity of products in each product category
Below are the different types of aggregations:
Types of aggregations
- Bucket aggregations: Aggregations that group documents into buckets, also called bins, based on field values, ranges, or other criteria in the document. When the aggregation is performed, the documents are placed in the respective bucket(s). This way you can divide a set of invoices into several buckets, one for each customer, system logs can be divided into “error”,”warning” and “info”, or CPU performance data divided into hourly buckets. The output consists of a list of buckets, each with a key and a count of documents. Here are some examples of bucket aggregations: Histogram Aggregation, Range Aggregation, Terms Aggregation, Filter(s) Aggregations, Geo Distance Aggregation and IP Range Aggregation.
- Metric aggregations: Aggregations that calculate metrics, such as a sum or average, from field values. Mainly refers to the mathematical calculations performed across a set of documents, usually based on the values of a numerical field present in the document, such as COUNT, SUM, MIN, MAX, AVERAGE etc. Metrics may be carried out at top level, but are often more useful as a sub aggregation to calculate values for a bucket aggregation.
- Pipeline aggregations: Aggregations that take input from other aggregations instead of documents or fields. These aggregations allow you to aggregate based on the result of another aggregation rather than from document sets. Typically this aggregation is used to find the average number of documents in a bucket, or to sort buckets based upon a metric produced by a metric aggregation.
Aggregation syntax
You request the cluster to run aggregations by adding an aggregations (or aggs for short) parameter in your search request. You can ask for more than one aggregation per request. You can even ask for sub-aggregations of a bucket aggregation. The following example shows a request that asks for the sum of the quantities of products, grouped by country.
In the example below, let’s say the use case is an e-Commerce website that acts as a marketplace, meaning they actually allow third party vendors to advertise products in their website, so in this example we want to know how many units of each product there are in each country, and we do that by summing the stock of each third party vendor. This would give us a global stock.
POST products/_search { "size": 0, "aggs": { "by-country": { "terms": { "field": "country" }, "aggs": { "stock": { "sum": { "field": "qty" } } } } } }
Some things to notice in the example above:
- You can use aggregations and aggs interchangeably. Every aggregation (or sub aggregation) has a name (by-country and stock, in this case).
- We have set the size of the results to 0, which means we’re not getting any hits in the response. That’s not uncommon at all and is even recommended.
- In the example we only used the terms (bucket aggregation) and sum (metric aggregation) aggregation types, but the aggregations framework offers many more.
- We made use of a sub-aggregation. Notice the by-country aggregation actually creates buckets (groups) of results and then the stock aggregation gives a metric for each bucket. You can nest as many bucket aggregations as you want, before we finally (and optionally) run a metric aggregation on it.
Nesting aggregations
It is possible to nest aggregations inside one another (nothing to do with nested fields), so as to divide the buckets into sub buckets, or to calculate metrics from the sub buckets. The below aggregation will separate out all exam results by gender of the pupil and then calculate the average results for each gender. In this case, the important thing to understand is that the second aggregation will be calculated on the individual set of the bucket rather than the document set as a whole.
POST exam_results*/_search { "size": 0, "aggs": { "genders": { "terms": { "field": "gender" }, "aggs": { "avg_grade": { "avg": { "field": "grades" } } } } } }
Aggregation performance
Aggregations are typically carried out in RAM memory, and require a different document access structure than a search query that is obtained from the inverted index, so it is important to consider the implication of performance when constructing your aggregations. The most important considerations are:
Number of buckets
This would be controlled by the “size” parameter in a terms aggregation, or the “calendar interval” in a date histogram. Bear in mind that where you have bucket aggregations nested at more than one level, then the total number of buckets will be multiplied for each level of aggregation.
Number of documents
When running an aggregation,it is preferable (if possible) to adjust the query so that your aggregation is only performed on a restricted set of those documents that you are interested in, instead of using a match_all query. This will reduce the memory required to run the aggregation.
Fielddata
Aggregations as a rule should always be run on keyword type fields, not analysed text. It is possible to run on analyzed text by using the mapping setting “fielddata”:”true” but this is highly memory intensive and should be avoided if possible.
Log Context
Log “Encountered more than one sort value for a” class name is GeoLineBucketedSort.java. We extracted the following from Elasticsearch source code for those seeking an in-depth context :
@Override protected boolean advanceExact(int doc) throws IOException { if (docSortValues.advanceExact(doc)) { if (docSortValues.docValueCount() > 1) { throw new AggregationExecutionException("Encountered more than one sort value for a " + "single document. Use a script to combine multiple sort-values-per-doc into a single value."); } // There should always be one weight if advanceExact lands us here; either // a real weight or a `missing` weight