8

ElasticSearch builds the aggregation results based on all the hits of the query independently of the from and size parameters. This is what we want in most cases, but I have a particular case in which I need to limit the aggregation to the top N hits. The limits filter is not suitable as it does not fetch the best N items but only the first X matching the query (per shard) independently of their score.

Is there any way to build a query whose hit count has an upper limit N in order to be able to build an aggregation limited to those top N results? And if so how?

Subsidiary question: Limiting the score of matching documents could be an alternative even though in my case I would require a fixed bound. Does the min_score parameter affect aggregation?

Saeed Zhiany
  • 2,051
  • 9
  • 30
  • 41
b_habegger
  • 309
  • 2
  • 13
  • How did you end up doing this? I have the exact same issue and would be very glad about any hints how to achieve this. Thank you! – khituras Dec 25 '14 at 12:32

4 Answers4

1

You are looking for Sampler Aggregation.

I have a similar answer explained here

Optionally, you can use the field or script and max_docs_per_value settings to control the maximum number of documents collected on any one shard which share a common value.

Community
  • 1
  • 1
Rahul
  • 15,979
  • 4
  • 42
  • 63
0

If you are using an ElasticSearch cluster with version > 1.3, you can use top_hits aggregation by nesting it in your aggregation, ordering on the field you want and set the size parameter to X.

The related documentation can be found here.

ThomasC
  • 7,915
  • 2
  • 26
  • 26
  • From what I understand from the documentation, this does really resolve my case. I want to aggregate on the top hits of my scope query, not access the top hits of each bucket (which is what the top_hits aggregation provides). – b_habegger Nov 13 '14 at 08:47
  • Exactly; if you could then have a sub-aggregation of the top_hits aggregation, it could be working. But for some reason the top_hits aggregation does not accept sub aggregations. – khituras Dec 25 '14 at 12:39
0

I need to limit the aggregation to the top N hits

With nested aggregations, your top bucket can represent those N hits, with nested aggregations operating on that bucket. I would try a filter aggregation for the top level aggregation.

The tricky part is to make use the of _score somehow in the filter and to limit it exactly to N entries... There is a limit filter that works per shard, but I don't think it would work in this context.

BenG
  • 1,292
  • 8
  • 11
  • The limit filter will indeed not work because it only stops at the first X documents matching the query independently of their score which I need to be taken into account. – b_habegger Nov 13 '14 at 08:49
  • For the filter aggregation I would need a top_hits filter... but that doesn't seem to exist. – b_habegger Nov 13 '14 at 08:51
0

It looks like Sampler Aggregation can now be used for this purpose. Note that it is only available as of Elastic 2.0.

Matthew Gertner
  • 4,487
  • 2
  • 32
  • 54