19

All my documents have a uid field with an ID that links the document to a user. There are multiple documents with the same uid.

I want to perform a search over all the documents returning only the highest scoring document per unique uid.

The query selecting the relevant documents is a simple multi_match query.

Florian
  • 3,366
  • 1
  • 29
  • 35
TheHippo
  • 61,720
  • 15
  • 75
  • 100

2 Answers2

23

You need a top_hits aggregation.

And for your specific case:

{
  "query": {
    "multi_match": {
      ...
    }
  },
  "aggs": {
    "top-uids": {
      "terms": {
        "field": "uid"
      },
      "aggs": {
        "top_uids_hits": {
          "top_hits": {
            "sort": [
              {
                "_score": {
                  "order": "desc"
                }
              }
            ],
            "size": 1
          }
        }
      }
    }
  }
}

The query above does perform your multi_match query and aggregates the results based on uid. For each uid bucket it returns only one result, but after all the documents in the bucket were sorted based on _score in descendant order.

Yu Jiaao
  • 4,444
  • 5
  • 44
  • 57
Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89
  • It there any good way to paginate through the resulting buckets? – TheHippo Oct 29 '14 at 15:52
  • It seems there's a long discussion about this in [github](https://github.com/elasticsearch/elasticsearch/issues/4915). And it's not the only issue where this was discussed. – Andrei Stefan Oct 30 '14 at 06:43
18

In ElasticSearch 5.3 they added support for field collapsing. You should be able to do something like:

GET /_search
{
  "query": {
    "multi_match" : {
      "query":    "this is a test", 
      "fields": [ "subject", "message", "uid" ] 
    }
  },
  "collapse" : {
    "field" : "uid" 
  },
  "size": 20,
  "from": 100
}

The benefit of using field collapsing instead of a top hits aggregation is that you can use pagination with field collapsing.

Chase
  • 3,123
  • 1
  • 30
  • 35
  • This is nice, but doesn't work on text fields. – Austin Poulson Jan 17 '23 at 21:06
  • 1
    Correct, the query can be, just the field you collapse on can't be. From the docs "The field used for collapsing must be a single valued keyword or numeric field with doc_values activated". So it works for something like a UUID or a URL, etc. If it is applicable to the data you could use a multi-field approach as described in https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html to have the field be available as both text and keyword. – Chase Jan 18 '23 at 23:24