In a certain index documents have a keyword, a rank and a timestamp. The rank for a keyword may differ from time to time. This means the dataset may look like this:
{"keywords": "piano", "rank" 1, "timestamp": 1437642812}
{"keywords": "piano", "rank" 2, "timestamp": 1437642813}
{"keywords": "electric guitar", "rank" 5, "timestamp": 1437644326}
I would like to get the average rank of the top 500 most occuring keywords. But I cannot find out how to do this.
My current try-outs seem to always give the average for the results individually, but I want to get the average for the entire dataset of only the top results of the aggregation.
POST _search
{
"aggs": {
"top_keywords": {
"terms": {
"field": "keywords",
"size": 1
}
},
"avg_rank": {
"avg": {"field": "rank"}
}
},
"size": 0
}
Attempts using top_hits haven't been successful either.
Elsewhere I have read about splitting the query into separate queries, first retrieving a list of top keywords and in a second query filter the documents by the keywords returned from the first query. I would like to feed the query into Kibana, so I hope this is not required.
These related topics don't provide a satisfying answer either.
Can anyone point me in the right direction?