8

I am new to "Elastic Search" and currently trying to understand how does ES maintain "Parent-Child" relationship. I started with the following article:

https://www.elastic.co/blog/managing-relations-inside-elasticsearch

But the article is based on old version of ES and I am currently using ES 7.5 which states that:

The _parent field has been removed in favour of the join field.

Now I am currently following this article:

https://www.elastic.co/guide/en/elasticsearch/reference/7.5/parent-join.html

However, I am not able to get the desired result.

I have a scenario in which i have two indices "Person" and "Home". Each "Person" can have multiple "Home" which is basically a one-to-many relation. Problem is when I query to fetch all homes whose parent is "XYZ" person the answer is null.

Below are my indexes structure and search query:

Person Index:

Request URL: http://hostname/person

{
    "mappings": {
        "properties": {
            "name": {
                "type": "text"
            },
            "person_home": {
                "type": "join",
                "relations": {
                    "person": "home"
                }
            }
        }
    }
}

Home Index:

Request URL: http://hostname/home

{
    "mappings": {
        "properties": {
            "state": {
                "type": "text"
            },
            "person_home": {
                "type": "join",
                "relations": {
                    "person": "home"
                }
            }
        }
    }
}

Adding data in person Index

Request URL: http://hostname/person/_doc/1

{
    "name": "shujaat",
    "person_home": {
        "name": "person"
    }
}

Adding data in home index

Request URL: http://hostname/home/_doc/2?routing=1&refresh

{
    "state": "ontario",
    "person_home": {
        "name": "home",
        "parent": "1"
    }
}

Query to fetch data: (To fetch all the records who parent is person id "1")

Request URL: http://hostname/person/_search

   {
    "query": {
        "has_parent": {
            "parent_type": "person",
            "query": {
                "match": {
                    "name": "shujaat"
                }
            }
        }
    }
}

OR

{
    "query": {
        "has_parent": {
            "parent_type": "person",
            "query": {
                "match": {
                    "_id": "1"
                }
            }
        }
    }
}

Response:

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

I am unable to understand what I am missing here or what is wrong with the above mentioned query as it not returning any data.

Nikolay Vasiliev
  • 5,656
  • 22
  • 31
shujaat siddiqui
  • 1,527
  • 1
  • 20
  • 41

1 Answers1

10

You should put the parent and child documents in the same index:

The join datatype is a special field that creates parent/child relation within documents of the same index.

So the mapping would look like the following:

PUT http://hostname/person_home
{
    "mappings": {
        "properties": {
            "name": {
                "type": "text"
            },
            "state": {
                "type": "text"
            },
            "person_home": {
                "type": "join",
                "relations": {
                    "person": "home"
                }
            }
        }
    }
}

Notice that it has both fields from your original person and home indexes.

The rest of your code should work just fine. Try inserting the person and home documents into the same index person_home and use the queries as you posted in the question.

What if person and home objects have overlapping field names?

Let's say, both object types have got field name but we want to index and query them separately. In this case we can come up with a mapping like this:

PUT http://hostname/person_home
{
    "mappings": {
        "properties": {
            "person": {
                "properties": {
                    "name": {
                        "type": "text"
                    }
                }
            },
            "home": {
                "properties": {
                    "name": {
                        "type": "keyword"
                    },
                    "state": {
                        "type": "text"
                    }
                }
            },
            "person_home": {
                "type": "join",
                "relations": {
                    "person": "home"
                }
            }
        }
    }
}

Now, we should change the structure of the objects themselves:

PUT http://hostname/person_home/_doc/1
{
    "name": "shujaat",
    "person_home": {
        "name": "person"
    }
}

PUT http://hostname/person_home/_doc/2?routing=1&refresh
{
    "home": {
        "name": "primary",
        "state": "ontario"
    },
    "person_home": {
        "name": "home",
        "parent": "1"
    }
}

If you have to migrate old data from the two old indexes into a new merged one, reindex API may be of use.

Nikolay Vasiliev
  • 5,656
  • 22
  • 31
  • Thanks for the reply. My question is basically how to join two different indexes because after 7.0 version and above document with multiple mapping types has been obsolete and its not recommended because there can be multiple issues like what if the two documents have same field name and etc. However, in other scenario your solution will work well. Thanks. – shujaat siddiqui Jan 23 '20 at 13:14
  • @shujaatsiddiqui You can wrap `person` fields in an outer object, and the same for `home`, and then reindex them into the same index. This will solve the overlapping field names problem. Does it make sense to you? I can provide an example in the answer if needed. – Nikolay Vasiliev Jan 23 '20 at 17:45
  • it will be very helpful if you could share the example. – shujaat siddiqui Jan 24 '20 at 06:42
  • @Nikolay Vasiliev In ES 7.6 the provided example is not working. "reason": "Failed to parse mapping [_doc]: No type specified for field [person]" Do you know how to fix this case with overlapping names? Thanks – Maurizio Lo Bosco Sep 22 '20 at 10:55
  • @MaurizioLoBosco Thanks for reporting, there was some part of the code missing, I updated it, should work now! – Nikolay Vasiliev Sep 24 '20 at 16:52
  • Any idea for how to reindex two seperate indices into one parent-child index? I can only find reindex api usage for reindex the same index. Any suggestion would be appreciated. – puppylpg Mar 28 '22 at 04:27
  • 1
    @puppylpg I suppose it can be done via 2 different reindex operations, one for the "parent" source index, one for the "child" source index. You just need to set the "join" field properly + the routing for child documents. I foundan example script in [this answer](https://stackoverflow.com/a/50607003/5095957) which shows the idea of what you should be doing. – Nikolay Vasiliev Mar 28 '22 at 09:53
  • @NikolayVasiliev Thanks so much for your answer! I also found it after searching for a while. There is only one thing puzzles me now: I want to index both original indices into parent-child index as objects under `person` or `home` field, just like the mappings you posted, which I agree with you that this is a good way to handle overlapping fields. But I still don't know how to write that in scripts, to add outter `person` or `home` field for the orginal objects. Do you have relevant experiance? Thanks again for your kindness~ – puppylpg Mar 28 '22 at 15:42
  • Ok I managed to know the answer! Use scripts like this in reindex api: ``` "script": { "source": "ctx._source = ['person': ctx._source]" } ``` The key is to use [map in painless grammar](https://www.elastic.co/guide/en/elasticsearch/reference/5.4/modules-scripting-painless-syntax.html#painless-maps) to build a object like what [this answer](https://stackoverflow.com/a/49378001/7676237) posted. – puppylpg Mar 29 '22 at 04:13