Filter results to remove documents with the same field value based on another field value (without aggregation)

Question

Given the following 4 objects in an elasticsearch index:

"hits": [
  {
    "_id": "0:0",
    "_source": {
      "id": 0,
      "version": 0,
      "published": true
    }
  },
  {
    "_id": "0:1",
    "_source": {
      "id": 0,
      "version": 1,
      "published": false,
      "latest": true
    }
  },
  {
    "_id": "1:0",
    "_source": {
      "id": 1,
      "version": 0,
      "published": true
    }
  },
  {
    "_id": "1:1",
    "_source": {
      "id": 1,
      "version": 1,
      "published": true,
      "latest": true
    }
  }
]

I would like to find the documents using these rules:

with published:true
no duplicate id
for documents with the same id the highest version should be returned.

So for the above I'd like to get 0:0 and 1:1:

"hits": [
  {
    "_id": "0:0",
    "_source": {
      "id": 0,
      "version": 0,
      "published": true
    }
  },
  {
    "_id": "1:1",
    "_source": {
      "id": 1,
      "version": 1,
      "published": true,
      "latest": true
    }
  }
]

I'm aware that I can use top_hits, but I'd like to know if this is possible without it, such that the main hits.hits array will contain these results.

I'd probably do the collapsing as follows:

{ 
  query  : {...},
  aggs : {
    ids: {
      terms: {
          field: "id"
      },
      aggs:{
          dedup:{
            top_hits:{ size:1, sort: {version : 'desc'} }
          }
        }    
    }
  }
}

The reason I'm hoping to avoid using top_hits is that I'll need to update the result parser in our application. Also the size field will not work correctly if I do so.

score 0 · Answer 1 · edited May 23 '17 at 12:22

0

To answer my own question based on this answer, it's not possible without using the top_hits aggregation. I think what I was trying to achieve wasn't the best use of aggregation. Instead I'm going to adjust the index model by adding latestPublished true to the relevant models, allowing the query to be { term: { latestPublished: true}}.

edited May 23 '17 at 12:22

Community

1
1

answered Jun 09 '16 at 09:37

ed.

2,696
3
22
25

Filter results to remove documents with the same field value based on another field value (without aggregation)

1 Answers1