8

Say I have an Elasticsearch index with bunch of users' comments:

{ "name": "chris", "date": "2016-01-01", "msg": "hi, foo"}
{ "name": "chris", "date": "2016-01-05", "msg": "bye, bar"}
{ "name": "aaron", "date": "2016-01-10", "msg": "who's bar"}
{ "name": "aaron", "date": "2016-01-15", "msg": "not foo"}

First, I want to find the lastest comment for each user. I can do that with the top_hits aggregation:

"aggs": {
    "name": {
      "terms": { "field": "name" },
      "aggs": {
        "latest_comment": {
          "top_hits": {
            "sort": [ {"date": { "order": "desc" } } ],
            "size": 1
            }
          }
        }
      }
    }
  }

Which effectively gives me the following:

{ "name": "chris", "date": "2016-01-05", "msg": "bye, bar"}
{ "name": "aaron", "date": "2016-01-15", "msg": "not foo"}

But how can I filter those results now?? And to be super clear, I want to filter after the top_hits aggregation has picked the latest hits, not before.

Thank you.

cjbottaro
  • 856
  • 9
  • 11
  • What do you want to filter out of the top hits you got? Please explain your use case in a bit more details. – Val Apr 13 '16 at 02:54
  • For example, I want to say "return the latest comment for each user that has 'foo' in the msg." To be clear, if I filtered on 'foo' then found the top hit, it would be `{ "name": "chris", "date": "2016-01-01", "msg": "hi, foo"}` for user `chris`, which is _not_ his latest comment. – cjbottaro Apr 13 '16 at 18:42
  • @cjbottaro exact same question I had here: https://stackoverflow.com/questions/51360616/elasticsearch-exclude-top-hit-on-field-value A consultant went by at my company and said that it is not possible with what elasticsearch version 6.2 or below offers. You have to create a custom script to filter top hits or filter it on the client-side or filter it on the server-side after getting the results. It's not the best solution but a solution. – Ismail Dec 03 '18 at 09:09

1 Answers1

1

I had the exact question. The result after a lot of search was this:

If you want to filter the top hits results based on a numeric metric, you can use pipeline aggregations like bucket selector. This way is somehow implementing a SQL HAVING in elasticsearch. a very helpful answer for this case can be find implementing HAVING in elasticsearch

But if your metric to filter is not numeric there is no way (at least until v 6.2.4) to do that in elasticsearch side.

In this case as @ismail said you need to do that in client-side by your software.