I would like to use this opportunity to advertise different approach to the given problem. In fact, ElasticSearch: The Definitive Guide does pretty good job on its own, I just have to quote it:
Four common techniques are used to manage relational data in
Elasticsearch:
- Application-side joins
- Data denormalization
- Nested objects
- Parent/child relationships
Often the final solution will require a mixture of a few of these
techniques.
Data denormalization in practice means that data gets stored in a way that one single query performs the trick that you would do before with 2 consecutive queries.
Here I will unfold the example from the aforementioned book. Suppose you have two following indices, and you wish to find all blog posts written by any person named John:
PUT /my_index/user/1
{
"name": "John Smith",
"email": "john@smith.com",
"dob": "1970/10/24"
}
PUT /my_index/blogpost/2
{
"title": "Relationships",
"body": "It's complicated...",
"userID": 1
}
There is no other option but to first fetch the IDs of all Johns in the database. What you could do instead is to move some of the user information on the blogpost object:
PUT /my_index/user/1
{
"name": "John Smith",
"email": "john@smith.com",
"dob": "1970/10/24"
}
PUT /my_index/blogpost/2
{
"title": "Relationships",
"body": "It's complicated...",
"user": {
"id": 1,
"name": "John Smith"
}
}
Hence enabling search on user.name
of the index blogpost
.
Apart from traditional ElasticSearch methods you may also consider using third-party plugins like Siren Join:
This join is used to filter one document set based on a second
document set, hence its name. It is equivalent to the EXISTS()
operator in SQL.