24

I have two indexes that must be separated:

// index = `order_item`
{
    "ID": 1,
    "Name": "Shoes",
    "Price": 9.99,
    "OrderID": 82
},{
    "ID": 1,
    "Name": "Hat",
    "Price": 19.99,
    "OrderID": 82
}

// index = `order`
{
    "ID": 82,
    "Customer": "John Smith"
}

How would I 'join' these two tables on a search, such that it would return something along the lines of:

results = {
    "ID": 1,
    "Name": "Shoes",
    "Price": 9.99,
    "Order.ID": 82,
    "Customer": "John Smith"
},{
    "ID": 1,
    "Name": "Hat",
    "Price": 19.99,
    "Order.ID": 82,
    "Customer": "John Smith"
}
David542
  • 104,438
  • 178
  • 489
  • 842
  • 2
    Unfortunatly, it can't be done David – eliasah Oct 25 '15 at 21:34
  • 2
    @eliasah -- I see, thank you. What would be your suggestion here then as how we might be able to merge the data. For example, something like a processing job that merges it and then creates a third index. Are there any possibilities here? – David542 Oct 25 '15 at 21:41
  • Have you considered a parent/child relationship between the documents? https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child.html note that individual indexes aren't tables like in a RDBMS. – John Ament Oct 25 '15 at 22:20
  • 1
    It depends on the size of the data of course. A processing job to perform joins would be suitable in this case to create a third index like you suggest sound like a perfect idea. I usually use Elasticsearch on top of Spark to perform transformations on big data. – eliasah Oct 25 '15 at 22:27
  • @eliasah interesting. Are you available for consulting work regarding es + Spark? If so do you have a link or some way I could connect. Thanks. – David542 Oct 25 '15 at 23:51
  • You can find some resources on the Elasticsearch official website concerning Elasticseach Hadoop. And I have also answered a question were I post parts of code on that topic – eliasah Oct 25 '15 at 23:53
  • Sorry I'm on my phone. It's not easy to navigate on the site here is the link http://stackoverflow.com/q/32667068/3415409 – eliasah Oct 25 '15 at 23:56
  • Possible duplicate of [Join query in ElasticSearch](https://stackoverflow.com/questions/22611049/join-query-in-elasticsearch) – Mišo May 01 '19 at 09:53

1 Answers1

27

As answered in your other question, nothing prevents you from storing the Customer name inside each order_item document at indexing time, while still having a dedicated index orders also containing the Customer data. Remember that it's all about cleverly denormalizing your data so that each of your documents be as "self-contained" as you need.

curl -XPUT localhost:9200/order_items/order_item/1 -d '{
    "ID": 1,
    "Name": "Shoes",
    "Price": 9.99,
    "OrderID": 82,
    "Customer": "John Smith"
}'

curl -XPUT localhost:9200/order_items/order_item/2 -d '{
    "ID": 2,
    "Name": "Hat",
    "Price": 19.99,
    "OrderID": 82,
    "Customer": "John Smith"
}

The advantages of this solution is that each order item is completely self-contained, and you can group/aggregate them on OrderID in order to get all items of a given order.

Also, as @JohnAment mentioned in his comment, the order/order_item use case is also a good candidate for using either

  1. the parent/child relationship
  2. or nested objects.

In the first case, you'd have one order "parent" document...

curl -XPUT localhost:9200/orders/order/82 -d '{
    "ID": 82,
    "Customer": "John Smith"
}'

And several order_item "children" documents that you index using their parent ID:

curl -XPUT localhost:9200/order_items/order_item/1?parent=82 -d '{
     "ID": 1,
     "Name": "Shoes",
     "Price": 9.99
}'
curl -XPUT localhost:9200/order_items/order_item/2?parent=82 -d '{
     "ID": 2,
     "Name": "Hat",
     "Price": 19.99
}'

In the second case, your order document would contain all order items in a nested OrderItems property and would look like this:

curl -XPUT localhost:9200/orders/order/82 -d '{
    "ID": 82,
    "Customer": "John Smith",
    "OrderItems": [
      {
        "ID": 1,
        "Name": "Shoes",
        "Price": 9.99
      },{
        "ID": 2,
        "Name": "Hat",
        "Price": 19.99
      }
    ]
}'
Val
  • 207,596
  • 13
  • 358
  • 360
  • do you have any new updates to this approach, especially since mapping multiple types is being affected in elastic 6 ? – Kalpesh Soni Jan 03 '19 at 19:23
  • @KalpeshSoni parent/child is still possible in ES6 via `join` fields, although you can now only have a single parent/child join per index. – Val Jan 03 '19 at 19:24
  • ok and would you prefer nested objects or join fields when you have lot of filtering and sorting needs just like what SQL where clause and order by does – Kalpesh Soni Jan 03 '19 at 19:36
  • @KalpeshSoni It really depends on the use cases. Feel free to create a new question highlighting your specific needs. – Val Jan 04 '19 at 04:43
  • there is a typo. Use comma (,) after "Customer": "John Smith" – Avi Jul 25 '20 at 02:49
  • for "localhost:9200/order_items/order_item/1?parent=82" elastic response with "request [/order_items/order_item/1] contains unrecognized parameter: [parent]" @Val –  Feb 08 '21 at 08:35
  • @mohammadtwin feel free to create a new question describing your issue – Val Feb 08 '21 at 08:49
  • @Val asked here "https://stackoverflow.com/questions/66098816/parent-child-relationship-didnt-work-in-elasticsearch" –  Feb 08 '21 at 09:21