In Elasticsearch there's a limit on how many buckets you can create in an aggregation. If it creates more buckets than the specified limit, you will get a warning message In ES 6.x and an error will be thrown in future versions.
Here's the warning message:
This aggregation creates too many buckets (10001) and will throw an error in future versions. You should update the [search.max_buckets] cluster setting or use the [composite] aggregation to paginate all buckets in multiple requests.
Since ES 7.x, that limit is set to 10000
which can be adjusted, though.
The problem is, I can't actually calculate (or estimate) how many buckets an aggregation is going to create.
Consider the following request:
GET /zone_stats_hourly/_search
{
"aggs":{
"apps":{
"terms":{
"field":"appId",
"size":<NUM_TERM_BUCKETS>,
"min_doc_count":1,
"shard_min_doc_count":0,
"show_term_doc_count_error":false,
"order":[
{
"_count":"desc"
},
{
"_key":"asc"
}
]
},
"aggregations":{
"histogram":{
"days":{
"field":"processTime",
"time_zone":"UTC",
"interval":"1d",
"offset":0,
"order":{
"_key":"asc"
},
"keyed":false,
"min_doc_count":0
},
"aggregations":{
"requests":{
"sum":{
"field":"requests"
}
},
"filled":{
"sum":{
"field":"filledRequests"
}
},
"matched":{
"sum":{
"field":"matchedRequests"
}
},
"imp":{
"sum":{
"field":"impressions"
}
},
"cv":{
"sum":{
"field":"completeViews"
}
},
"clicks":{
"sum":{
"field":"clicks"
}
},
"installs":{
"sum":{
"field":"installs"
}
},
"actions":{
"sum":{
"field":"actions"
}
},
"earningsIRT":{
"sum":{
"field":"earnings.inIRT"
}
},
"earningsUSD":{
"sum":{
"field":"earnings.inUSD"
}
},
"earningsEUR":{
"sum":{
"field":"earnings.inEUR"
}
},
"dealBasedEarnings":{
"nested":{
"path":"dealBasedEarnings"
},
"aggregations":{
"types":{
"terms":{
"field":"dealBasedEarnings.type",
"size":4,
"min_doc_count":1,
"shard_min_doc_count":0,
"show_term_doc_count_error":false,
"order":[
{
"_count":"desc"
},
{
"_key":"asc"
}
]
},
"aggregations":{
"dealBasedEarningsIRT":{
"sum":{
"field":"dealBasedEarnings.amount.inIRT"
}
},
"dealBasedEarningsUSD":{
"sum":{
"field":"dealBasedEarnings.amount.inUSD"
}
},
"dealBasedEarningsEUR":{
"sum":{
"field":"dealBasedEarnings.amount.inEUR"
}
}
}
}
}
}
}
}
}
}
},
"size":0,
"_source":{
"excludes":[]
},
"stored_fields":["*"],
"docvalue_fields":[
{
"field":"eventTime",
"format":"date_time"
},
{
"field":"processTime",
"format":"date_time"
},
{
"field":"postBack.time",
"format":"date_time"
}
],
"query":{
"bool":{
"must":[
{
"range":{
"processTime":{
"from":1565049600000,
"to":1565136000000,
"include_lower":true,
"include_upper":false,
"boost":1.0
}
}
}
],
"adjust_pure_negative":true,
"boost":1.0
}
}
}
If I set <NUM_TERM_BUCKETS>
to 2200
and perform the request, I get the warning message that says I'm creating more than 10000
buckets (how?!).
A sample response from ES:
#! Deprecation: 299 Elasticsearch-6.7.1-2f32220 "This aggregation creates too many buckets (10001) and will throw an error in future versions. You should update the [search.max_buckets] cluster setting or use the [composite] aggregation to paginate all buckets in multiple requests."
{
"took": 6533,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 103456,
"max_score": 0,
"hits": []
},
"aggregations": {
"apps": {
"doc_count_error_upper_bound": 9,
"sum_other_doc_count": 37395,
"buckets":[...]
}
}
}
More interestingly, after decreasing <NUM_TERM_BUCKETS>
to 2100
, I get no warning messages, which means the number of buckets created is below 10000
.
I've had a hard time to find the reason behind that and found NOTHING.
Is there any formula or something to calculate or estimate the number of buckets that an aggregation is going to create before actually performing the request?
I want to know if an aggregation throws error in ES 7.x or later regarding to a specified search.max_buckets
, so that I can decide whether to use the composite
aggregation or not.
UPDATE
I tried a much simpler aggregation containing no nested or sub aggregations on an index having roughly 80000
documents.
Here is the request:
GET /my_index/_search
{
"size":0,
"query":{
"match_all":{}
},
"aggregations":{
"unique":{
"terms":{
"field":"_id",
"size":<NUM_TERM_BUCKETS>
}
}
}
}
If I set the <NUM_TERM_BUCKETS>
to 7000
, I get this error response in ES 7.3:
{
"error":{
"root_cause":[
{
"type":"too_many_buckets_exception",
"reason":"Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
"max_buckets":10000
}
],
"type":"search_phase_execution_exception",
"reason":"all shards failed",
"phase":"query",
"grouped":true,
"failed_shards":[
{
"shard":0,
"index":"my_index",
"node":"XYZ",
"reason":{
"type":"too_many_buckets_exception",
"reason":"Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
"max_buckets":10000
}
}
]
},
"status":503
}
And it runs successfully if I decrease the <NUM_TERM_BUCKETS>
to 6000
.
Seriously, I'm confused. how on earth this aggregation creates more than 10000
buckets? Can anyone answer this?