ElasticSearch: Using output of one query as input to another

Question

I have a problem which requires to fetch a doc based on id from elasticsearch and use that to make another query. This works but I am forced to make two round trips to elasticsearch cluster. Can I somehow do this in one query something like query elasticsearch and use its output as an input to another query to avoid the round trip ?

Please let me know if you don't understand the issue.

You pretty much asked this one more time: http://stackoverflow.com/questions/26977932/elasticsearch-find-documents-by-another-document. And I don't think there is any other option than that or [mlt query](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html). — Andrei Stefan, Nov 18 '14 at 12:04
Have you tried that? Suppose not, if you haven't replied to those answers. If you did and still asking questions about it, what did you try and why are you not satisfied with mlt? — Andrei Stefan, Nov 18 '14 at 12:05
@AndreiStefan Can more like this, use filter on certain values? As far as I know it selects the relevant terms and searches on the field mentioned but how can I force filter on some fields? — Global Warrior, Nov 18 '14 at 12:32
I don't think it can do that. It seems to be using queries, not filters. — Andrei Stefan, Nov 18 '14 at 12:39
It looks like a there is another question like that, which has an answer: https://stackoverflow.com/questions/28734436/what-is-the-elasticsearch-equivalent-for-an-sql-subquery Short answer: no, ES does not have subqueries. — Nikolay Vasiliev, Jun 23 '17 at 18:08

score 3 · Accepted Answer · answered Jun 23 '17 at 18:33

I would like to use this opportunity to advertise different approach to the given problem. In fact, ElasticSearch: The Definitive Guide does pretty good job on its own, I just have to quote it:

Four common techniques are used to manage relational data in Elasticsearch:

Application-side joins

Data denormalization

Nested objects

Parent/child relationships

Often the final solution will require a mixture of a few of these techniques.

Data denormalization in practice means that data gets stored in a way that one single query performs the trick that you would do before with 2 consecutive queries.

Here I will unfold the example from the aforementioned book. Suppose you have two following indices, and you wish to find all blog posts written by any person named John:

PUT /my_index/user/1
{
  "name":     "John Smith",
  "email":    "john@smith.com",
  "dob":      "1970/10/24"
}

PUT /my_index/blogpost/2
{
  "title":    "Relationships",
  "body":     "It's complicated...",
  "userID":     1
}

There is no other option but to first fetch the IDs of all Johns in the database. What you could do instead is to move some of the user information on the blogpost object:

PUT /my_index/user/1
{
  "name":     "John Smith",
  "email":    "john@smith.com",
  "dob":      "1970/10/24"
}

PUT /my_index/blogpost/2
{
  "title":    "Relationships",
  "body":     "It's complicated...",
  "user":     {
    "id":       1,
    "name":     "John Smith" 
  }
}

Hence enabling search on user.name of the index blogpost.

Apart from traditional ElasticSearch methods you may also consider using third-party plugins like Siren Join:

This join is used to filter one document set based on a second document set, hence its name. It is equivalent to the EXISTS() operator in SQL.

ElasticSearch: Using output of one query as input to another

1 Answers1