Given the following 4 objects in an elasticsearch index:
"hits": [
{
"_id": "0:0",
"_source": {
"id": 0,
"version": 0,
"published": true
}
},
{
"_id": "0:1",
"_source": {
"id": 0,
"version": 1,
"published": false,
"latest": true
}
},
{
"_id": "1:0",
"_source": {
"id": 1,
"version": 0,
"published": true
}
},
{
"_id": "1:1",
"_source": {
"id": 1,
"version": 1,
"published": true,
"latest": true
}
}
]
I would like to find the documents using these rules:
- with
published:true
- no duplicate
id
- for documents with the same
id
the highestversion
should be returned.
So for the above I'd like to get 0:0
and 1:1
:
"hits": [
{
"_id": "0:0",
"_source": {
"id": 0,
"version": 0,
"published": true
}
},
{
"_id": "1:1",
"_source": {
"id": 1,
"version": 1,
"published": true,
"latest": true
}
}
]
I'm aware that I can use top_hits, but I'd like to know if this is possible without it, such that the main hits.hits
array will contain these results.
I'd probably do the collapsing as follows:
{
query : {...},
aggs : {
ids: {
terms: {
field: "id"
},
aggs:{
dedup:{
top_hits:{ size:1, sort: {version : 'desc'} }
}
}
}
}
}
The reason I'm hoping to avoid using top_hits is that I'll need to update the result parser in our application. Also the size
field will not work correctly if I do so.