Elasticsearch Implementing Typeahead Functionality in Elasticsearch

By Opster Team

Updated: Nov 5, 2023

| 3 min read

Quick Links

Overview

Typeahead, also known as autocomplete, is a feature that provides suggestions to users as they type into a search box. This functionality enhances the user experience by making search more interactive and less error-prone. Elasticsearch, with its powerful full-text search capabilities and flexible data model, is an excellent choice for implementing typeahead functionality.

There are three main approaches to implementing typeahead in Elasticsearch: the edge n-gram approach, the `search_as_you_type` field type and the completion suggester approach. All have their strengths and weaknesses, and the choice between them depends on the specific requirements of your application.

Approaches to Implement Typeahead in Elasticsearch

1. Edge N-Gram Approach

The edge n-gram approach involves creating a custom analyzer that generates edge n-grams for the input text. An edge n-gram is a sequence of characters starting from the beginning of the text. For example, the edge n-grams of the word “search” are “s”, “se”, “sea”, “sear”, “searc”, and “search”.

Here are the steps to implement typeahead using the edge n-gram approach:

1. Define a custom analyzer that uses the edge_ngram tokenizer. This can be done when creating an index or by updating the index settings.

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase"
          ]
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 10,
          "token_chars": [
            "letter"
          ]
        }
      }
    }
  }
}

2. Use the custom analyzer in the mapping for the fields that you want to enable typeahead for.

PUT /my_index/_mapping
{
  "properties": {
    "name": {
      "type": "text",
      "analyzer": "autocomplete",
      "search_analyzer": "standard"
    }
  }
}

3. Index documents as usual. The custom analyzer will generate edge n-grams for the specified fields.

4. When performing a search, use the match query with the standard analyzer (i.e., the one specified as search_analyzer in the field mapping).

GET /my_index/_search
{
  "query": {
    "match": {
      "name": {
        "query": "sea"
      }
    }
  }
}

The above query will return all documents whose name field value starts with `sea`.

2. Search-as-you-type Approach

Search-as-you-type Approach

This approach consists of utilizing the dedicated `search_as_you_type` field type, which is optimized to provide search-as-you-type completion capabilities. In addition to applying a standard analysis on the field values, additional sub-fields are created to also index bigrams and trigrams, i.e. shingles of two tokens and three tokens, respectively. Moreover, the trigrams will also be tokenized with an edge-ngram token filter, similarly as in the first approach.

Here are the steps to implement typeahead using the search-as-you-type approach:

1. Define a field of type ‘search_as_you_type’ in the mapping.

```
PUT /my_index/_mapping
{
  "properties": {
    "name": {
      "type": "'search_as_you_type'"
    }
  }
}

As a result of this mapping, the name field will contain additional sub-fields as described above, such as:

  • `name._2gram` which will contain bigrams 
  • `name._3gram` which will contain trigrams 
  • `name._index_prefix` which will contain the trigram prefixes

2. Index documents with the search-as-you-type field.

PUT /my_index/_doc/1
{
  "name": "quick brown fox"
}

As a result of the mapping created in step 1, the `name` field will contain the following tokens: `quick`, `brown` and `fox`. The `name._2gram` field will contain `quick brown` and `brown fox` and the `name._3gram` field will contain `quick brown fox`. In addition, the `name._index_prefix` field will contain all the prefixes of the latter field, namely: `q`, `qu`, `qui`, `quic`, `quick`, `quick `, `quick b`, `quick br`, etc.

3. When performing a search, use the `multi_match` query of type `bool_prefix` on the root field and both the bigram and trigram sub-fields

GET /my_index/_search
{
  "query": {
    "multi_match": {
      "query": "brown f",
      "type": "bool_prefix",
      "fields": [
        "name",
        "name._2gram",
        "name._3gram"
      ]
    }
  }
}

The above query will return all documents whose name field value contains `brown f`.

3. Completion Suggester Approach

The completion suggester is a type of suggester that provides autocomplete functionality. It uses an in-memory data structure called FST (Finite State Transducer) to hold the possible completions for a prefix, which makes it very fast.

Here are the steps to implement typeahead using the completion suggester approach:

1. Define a field of type ‘completion’ in the mapping.

PUT /my_index/_mapping
{
  "properties": {
    "name_suggest": {
      "type": "completion"
    }
  }
}

2. Index documents with the completion field.

PUT /my_index/_doc/1
{
  "name": "Elasticsearch",
  "name_suggest": "Elasticsearch"
}

3. Use the suggest option of the search API to get suggestions.

json
POST /my_index/_search
{
  "suggest": {
    "name_suggestion": {
      "prefix": "elas",
      "completion": {
        "field": "name_suggest"
      }
    }
  }

The above suggest query will return all documents whose `name_suggest` field values starts with `elas`.

Final thoughts

The edge n-gram approach is more flexible and can handle more complex scenarios, but it requires more storage space and is slower than the completion suggester approach.

The search-as-you-type approach is similar to the edge n-gram approach but it is slightly more advanced, in that several different sub-fields with different analysis chains are created to cater for different needs.

The completion suggester approach is faster and uses less storage space, but it is less flexible and cannot handle scenarios such as infix and fuzzy matching as well as the edge n-gram approach. It is also more demanding on memory as the FST is completely stored in the heap space.

How helpful was this guide?

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?