Elasticsearch Joining Two Indexes in Elasticsearch: A Comprehensive Guide

By Opster Team

Updated: Jul 23, 2023

| 2 min read

Introduction 

In Elasticsearch, joining two indexes is not as straightforward as it is in traditional SQL databases. However, it is possible to achieve similar results using certain techniques and features provided by Elasticsearch. This article will delve into the process of joining two indexes in Elasticsearch, focusing on the use of the `terms` query and the `enrich` processor.

Using the Terms Query for Joining Two Indexes

The `terms` query is one of the most effective ways to join two indexes in Elasticsearch. This query is used to retrieve documents that contain one or more exact terms in a specific field. Here’s how you can use it to join two indexes:

  1. First, you need to retrieve the required data from the first index. This can be done using a simple `GET` request.
  1. Once you have the data from the first index, you can use it to query the second index. This is done using the `terms` query, where you specify the field and the values you want to match.

Here is an example:

json
GET /second_index/_search
{
  "query": {
    "terms": {
      "field_in_second_index": ["value1_from_first_index", "value2_from_first_index"]
    }
  }
}

In this example, `field_in_second_index` is the field in the second index that you want to match with the values from the first index. `value1_from_first_index` and `value2_from_first_index` are the values from the first index that you want to match in the second index.

The `terms` query also provides support to perform the two above steps in a single shot using a technique called terms lookup. Elasticsearch will take care of transparently retrieving the values to match from another index. For example, you have a `teams` index containing a list of players:

PUT teams/_doc/team1
{
  "players":   ["john", "bill", "michael"]
}
PUT teams/_doc/team2
{
  "players":   ["aaron", "joe", "donald"]
}

Now, it is possible to query a `people` index for all the people playing in team1, as shown below:

GET people/_search?pretty
{
  "query": {
    "terms": {
        "name" : {
            "index" : "teams",
            "id" : "team1",
            "path" : "players"
        }
    }
  }
}

In the example above, Elasticsearch will transparently retrieve the player names from the `team1` document present in the `teams` index (i.e. “john”, “bill”, “michael”) and find all people documents with a `name` field that contains any of those values. The equivalent SQL query would be:

SELECT p.* FROM people p
INNER JOIN teams t ON p.name = t.players

Using the Enrich Processor for Joining Two Indexes

The `enrich` processor is another powerful tool that can be used to join two indexes in Elasticsearch. This processor enriches the data of incoming documents by adding data from a pre-defined enrich index.

Here’s how you can use the `enrich` processor to join two indexes:

1. First, you need to create an enrich policy. This policy defines which index to use for enrichment and which field to match on. Here is an example:

json
PUT /_enrich/policy/my_enrich_policy
{
  "match": {
    "indices": "first_index",
    "match_field": "field_in_first_index",
    "enrich_fields": ["field_to_enrich"]
  }
}

2. Once the policy is created, you need to execute it:

json
POST /_enrich/policy/my_enrich_policy/_execute

3. After executing the policy, you can use the `enrich` processor in an ingest pipeline to enrich the data of incoming documents:

json
PUT /_ingest/pipeline/my_pipeline
{
  "processors": [
    {
      "enrich": {
        "policy_name": "my_enrich_policy",
        "field": "field_in_second_index",
        "target_field": "enriched_field"
      }
    }
  ]
}

In this example, `field_in_second_index` is the field in the second index that you want to enrich with data from the first index. `enriched_field` is the new field that will contain the enriched data.

One drawback of this approach is that if the data changes in `first_index`, the enrich policy needs to be re-executed as the enriched index is not updated or synchronized automatically from the source index it has been built from. However, if `first_index` is relatively stable, then this approach works great.

Conclusion 

In conclusion, while Elasticsearch does not support traditional join operations, it provides features like the `terms` query and the `enrich` processor that can be used to achieve similar results. It’s important to note that these methods have their limitations and should be used judiciously based on the specific requirements and the nature of the data.

How helpful was this guide?

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?