I have a set of 2.8 million docs with sets of tags that I'm querying with ElasticSearch, but many of these docs can be grouped together by one ID. I want to query my data using the tags, and then aggregate them by the ID that repeats. Often my search results have tens of thousands of documents, but I only want to aggregate the top 100 results of the search. How can I constrain an aggregation to only the top 100 results from a query?
3 Answers
A filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents.
"aggs": {
"bestDocs": {
"sampler": {
// "field": "<FIELD>", <-- optional, Controls diversity using a field
"shard_size":100
},
"aggs": {
"bestBuckets": {
"terms": {
"field": "id"
}
}
}
}
}
This query will limit the sub aggregation to top 100 docs from the result and then bucket them by ID.
Optionally, you can use the field or script and max_docs_per_value
settings to control the maximum number of documents collected on any one shard which share a common value.

- 15,979
- 4
- 42
- 63
The size parameter can be set to define how many term buckets should be returned out of the overall terms list.
By default, the node coordinating the search process will request each shard to provide its own top size term buckets and once all shards respond, it will reduce the results to the final list that will then be returned to the client. This means that if the number of unique terms is greater than size, the returned list is slightly off and not accurate (it could be that the term counts are slightly off and it could even be that a term that should have been in the top size buckets was not returned).
If set to 0, the size will be set to Integer.MAX_VALUE
.
Here is an example code to return top 100:
{
"aggs" : {
"products" : {
"terms" : {
"field" : "product",
"size" : 100
}
}
}
}
You can refer to this for more information.

- 7,896
- 2
- 29
- 44
-
9This does not answer OP's question. Requirement is to aggregate on top n query results of a search request. `size` parameter only controls how many aggregation buckets are returned. The scope is still all the documents that match the query criteria. – bittusarkar Mar 06 '15 at 11:25
-
Thanks, but exactly what @bsarkar said. – Patrick Pan Mar 06 '15 at 22:56
You can use the min_doc_count
parameter
{
"aggs" : {
"products" : {
"terms" : {
"field" : "product",
"min_doc_count" : 100
}
}
}
}

- 356
- 1
- 6
-
This does not answer OP's question. It will return the buckets which have aleast 100 entries but the buckets are not limited to top 100 results – Rahul Mar 16 '16 at 03:04