Average of top n terms

Question

In a certain index documents have a keyword, a rank and a timestamp. The rank for a keyword may differ from time to time. This means the dataset may look like this:

{"keywords": "piano", "rank" 1, "timestamp": 1437642812}
{"keywords": "piano", "rank" 2, "timestamp": 1437642813}
{"keywords": "electric guitar", "rank" 5, "timestamp": 1437644326}

I would like to get the average rank of the top 500 most occuring keywords. But I cannot find out how to do this.

My current try-outs seem to always give the average for the results individually, but I want to get the average for the entire dataset of only the top results of the aggregation.

POST _search
{
    "aggs": {
        "top_keywords": {
            "terms": {
                "field": "keywords",
                "size": 1
            }
        },
        "avg_rank": {
            "avg": {"field": "rank"}
        }
    },
    "size": 0
}

Attempts using top_hits haven't been successful either.

Elsewhere I have read about splitting the query into separate queries, first retrieving a list of top keywords and in a second query filter the documents by the keywords returned from the first query. I would like to feed the query into Kibana, so I hope this is not required.

These related topics don't provide a satisfying answer either.

Can anyone point me in the right direction?

Why don't you use Kibana for aggregation easily without writing any code? — Yuvraj Gupta, Jul 23 '15 at 14:17
I cannot use Kibana aggregations because it will result in the average per keyword and not in the average of the top n results. — Roel, Jul 27 '15 at 08:48

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

An ElasticsSearch developer told me it is currently not possible:

In the current version this is not possible, but with pipeline aggregations coming in version 2.0 you will be able to use the avg_bucket aggregation to do this: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-pipeline-avg-bucket-aggregation.html

In the mean time you would need to do an aggregation for the top 500 terms and perform the average calculation on the client side

And that makes it currently impossible to show this data in Kibana:

Yes this would work in 2.0 for requests straight to Elasticsearch. However, it will take some time for the functionality to be added to the Kibana interface. It is something the Kibana team are thinking about how to add though

Source: https://discuss.elastic.co/t/average-of-top-n-terms/26165

Average of top n terms

1 Answers1