0

Do you know how to use reverse_nested aggregation to get both the parent and ONLY the nested data inside my top hit aggregations ? The 'ONLY' part is the problem right now. This is my mapping :

{
    "ticket": {
        "mappings": {
            "properties": {
                "name": {
                    "type": "keyword"
                }
            },
            "tasks": {
                "type": "nested",
                "properties": {
                    "string_task_name": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}

My query uses top hits and reverse nested aggs.

{
    "aggs": {
        "object_tasks": {
            "nested": {
                "path": "object_tasks"
            },
            "aggs": {
                "filter_by_tasks_attribute": {
                    "filter": {
                        "bool": {
                            "must": [
                                {
                                    "wildcard": {
                                        "object_tasks.string_task_name.keyword": "*"
                                    }
                                }
                            ]
                        }
                    },
                    "aggs": {
                        "using_reverse_nested": {
                            "reverse_nested": {
                                "path": "object_tasks"
                            },
                            "aggs": {
                                "names": {
                                    "top_hits": {
                                        "_source": {
                                            "includes": [
                                                "object_tasks.string_task_name",
                                                "string_name"
                                            ]
                                        },
                                        "sort": [
                                            {
                                                "object_tasks.string_task_name.keyword": {
                                                    "order": "desc"
                                                }
                                            }
                                        ],
                                        "from": 0,
                                        "size": 10
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
{
    "hits": {
        "total": {
            "value": 25,
            "relation": "eq"
        },
        "max_score": null,
        "hits": [
            {
                "_index": "random_index",
                "_type": "_doc",
                "_id": "5",
                "_score": null,
                "_source": {
                    "object_tasks": [  ================> I don't want all these tasks names, I just want the task name of the current nested object I am in.
                                            {
                            "string_task_name": "task1"
                        },
                        {
                            "string_task_name": "task2"
                        },
                        {
                            "string_task_name": "task3"
                        },
                        {
                            "string_task_name": "task4"
                        }
                    ],
                    "string_name": "Dummy Ticket 854"
                },
                "sort": [
                    "seek_a_sme"
                ]
            }
        ]
    }
}

As you can see the result is giving me 4 tasks name. What I want is to return only 1 task name.

The only workaround I have found is to copy the data of tickets inside the tasks. But if I can avoid it that would be awesome.

misterone
  • 191
  • 2
  • 9

1 Answers1

1

I don't want all these tasks names, I just want the task name of the current nested object I am in.

The statement "of the current nested object I'm in" implies that you are inside of a nested context but you cannot be in one when you escape it through reverse_nested

I'm not sure if I truly understood what you're gunning for here but you could aggregate on the terms of object_tasks.string_task_name.keyword and the keys of this aggregation would then function as the individual "current nested objects" that you're after:

{
  "size": 0,
  "aggs": {
    "object_tasks": {
      "nested": {
        "path": "object_tasks"
      },
      "aggs": {
        "filter_by_tasks_attribute": {
          "filter": {
            "bool": {
              "must": [
                {
                  "wildcard": {
                    "object_tasks.string_task_name.keyword": "*"
                  }
                }
              ]
            }
          },
          "aggs": {
            "by_string_task_name": {
              "terms": {
                "field": "object_tasks.string_task_name.keyword",
                "order": {
                  "_key": "desc"
                }, 
                "size": 10
              },
              "aggs": {
                "using_reverse_nested": {
                  "reverse_nested": {},
                  "aggs": {
                    "names": {
                      "top_hits": {
                        "_source": {
                          "includes": [
                            "string_name"
                          ]
                        },
                        "from": 0,
                        "size": 10
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

yielding

"aggregations" : {
  "object_tasks" : {
    ...
    "filter_by_tasks_attribute" : {
      ...
      "by_string_task_name" : {
        ...
        "buckets" : [
          {
            "key" : "task4",                <--
            ...
            "using_reverse_nested" : {
              ...
              "names" : {
                "hits" : {
                  ...
                  "hits" : [
                    {
                      ...
                      "_source" : {
                        "string_name" : "Dummy Ticket 854"  <--
                      }
                    }
                  ]
                }
              }
            }
          },
          {
            "key" : "task3",                <--
            ...
          },
          {
            "key" : "task2",                <--
            ...
          },
          {
            "key" : "task1",                <--
            ...
            }
          }
        ]
      }
    }
  }
}

Notice that the top_hits aggregation doesn't need to be sorted anymore -- object_tasks.string_task_name.keyword will always be the same for any currently aggregated terms bucket. What I did instead was order this terms aggregation by _key which works the same way as a top_hits sort would have. BTW -- yours was missing the nested path parameter.

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • Thanks for your answer, the reason I was using top_hits is because I need to filter and sort by other parameters inside tasks. It looks like if I want to do that, I am going to have to create multiple terms aggregations and sort by them right ? – misterone Feb 16 '21 at 23:22
  • Yes, that’s a good start. Let me know how it goes. – Joe - GMapsBook.com Feb 17 '21 at 23:18
  • It was too painful to do all these term aggregations, so I preferred to just copy the content of the parent into the child and sort and filter by it. The problem is that now I need to sort by a scripted field inside the top hits, and for some reasons it doesn't work. I guess that's a separate topic. – misterone Feb 17 '21 at 23:27
  • Sorting by scripted fields won't work in a sort context because those are two separate APIs. Use a scripted sort like [here](https://stackoverflow.com/a/61738207/8160318) – Joe - GMapsBook.com Feb 18 '21 at 09:50
  • I see, what I meant is using a scripted sort, not sorting by a scripted field. But I understand your point. I actually tried to implement a scripted sort in a top hits aggregations, but face some issues here : https://stackoverflow.com/questions/67308707/access-a-field-in-the-context-of-scripted-sort-in-top-hits-aggregation Maybe you know what's wrong... – misterone Apr 28 '21 at 22:30