Limiting aggreation to the top X hits in elasticsearch

Question

ElasticSearch builds the aggregation results based on all the hits of the query independently of the from and size parameters. This is what we want in most cases, but I have a particular case in which I need to limit the aggregation to the top N hits. The limits filter is not suitable as it does not fetch the best N items but only the first X matching the query (per shard) independently of their score.

Is there any way to build a query whose hit count has an upper limit N in order to be able to build an aggregation limited to those top N results? And if so how?

Subsidiary question: Limiting the score of matching documents could be an alternative even though in my case I would require a fixed bound. Does the min_score parameter affect aggregation?

How did you end up doing this? I have the exact same issue and would be very glad about any hints how to achieve this. Thank you! — khituras, Dec 25 '14 at 12:32

score 1 · Answer 1 · edited May 23 '17 at 10:30

1

You are looking for Sampler Aggregation.

I have a similar answer explained here

Optionally, you can use the field or script and max_docs_per_value settings to control the maximum number of documents collected on any one shard which share a common value.

edited May 23 '17 at 10:30

Community

1
1

answered Mar 16 '16 at 03:33

Rahul

15,979
4
42
63

ThomasC · Answer 2 · 2014-08-21T21:43:44.600

0

If you are using an ElasticSearch cluster with version > 1.3, you can use top_hits aggregation by nesting it in your aggregation, ordering on the field you want and set the size parameter to X.

The related documentation can be found here.

edited Aug 21 '14 at 21:43

answered Aug 21 '14 at 15:18

ThomasC

7,915
2
26
26

From what I understand from the documentation, this does really resolve my case. I want to aggregate on the top hits of my scope query, not access the top hits of each bucket (which is what the top_hits aggregation provides). – b_habegger Nov 13 '14 at 08:47
Exactly; if you could then have a sub-aggregation of the top_hits aggregation, it could be working. But for some reason the top_hits aggregation does not accept sub aggregations. – khituras Dec 25 '14 at 12:39

score 0 · Answer 3 · answered Aug 22 '14 at 17:50

0

I need to limit the aggregation to the top N hits

With nested aggregations, your top bucket can represent those N hits, with nested aggregations operating on that bucket. I would try a filter aggregation for the top level aggregation.

The tricky part is to make use the of _score somehow in the filter and to limit it exactly to N entries... There is a limit filter that works per shard, but I don't think it would work in this context.

answered Aug 22 '14 at 17:50

BenG

1,292
8
11

The limit filter will indeed not work because it only stops at the first X documents matching the query independently of their score which I need to be taken into account. – b_habegger Nov 13 '14 at 08:49
For the filter aggregation I would need a top_hits filter... but that doesn't seem to exist. – b_habegger Nov 13 '14 at 08:51

score 0 · Answer 4 · answered Mar 15 '16 at 20:34

0

It looks like Sampler Aggregation can now be used for this purpose. Note that it is only available as of Elastic 2.0.

answered Mar 15 '16 at 20:34

Matthew Gertner

4,487
2
32
54

Limiting aggreation to the top X hits in elasticsearch

4 Answers4