Show all Elasticsearch aggregation results/buckets and not just 10

Question

I'm trying to list all buckets on an aggregation, but it seems to be showing only the first 10.

My search:

curl -XPOST "http://localhost:9200/imoveis/_search?pretty=1" -d'
{
   "size": 0, 
   "aggregations": {
      "bairro_count": {
         "terms": {
            "field": "bairro.raw"
         }
      }
   }
}'

Returns:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 16920,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "bairro_count" : {
      "buckets" : [ {
        "key" : "Barra da Tijuca",
        "doc_count" : 5812
      }, {
        "key" : "Centro",
        "doc_count" : 1757
      }, {
        "key" : "Recreio dos Bandeirantes",
        "doc_count" : 1027
      }, {
        "key" : "Ipanema",
        "doc_count" : 927
      }, {
        "key" : "Copacabana",
        "doc_count" : 842
      }, {
        "key" : "Leblon",
        "doc_count" : 833
      }, {
        "key" : "Botafogo",
        "doc_count" : 594
      }, {
        "key" : "Campo Grande",
        "doc_count" : 456
      }, {
        "key" : "Tijuca",
        "doc_count" : 361
      }, {
        "key" : "Flamengo",
        "doc_count" : 328
      } ]
    }
  }
}

I have much more than 10 keys for this aggregation. In this example I'd have 145 keys, and I want the count for each of them. Is there some pagination on buckets? Can I get all of them?

I'm using Elasticsearch 1.1.0

score 265 · Accepted Answer · edited Jan 20 '21 at 07:48

265

The size param should be a param for the terms query example:

curl -XPOST "http://localhost:9200/imoveis/_search?pretty=1" -d'
{
   "size": 0,
   "aggregations": {
      "bairro_count": {
         "terms": {
            "field": "bairro.raw",
             "size": 10000
         }
      }
   }
}'

Use size: 0 for ES version 2 and prior.

Setting size:0 is deprecated in 2.x onwards, due to memory issues inflicted on your cluster with high-cardinality field values. You can read more about it in the github issue here .

It is recommended to explicitly set reasonable value for size a number between 1 to 2147483647.

edited Jan 20 '21 at 07:48

Raj

22,346
14
99
142

answered Apr 08 '14 at 03:55

keety

17,231
4
51
56

14

Note that setting size:0 is now deprecated, due to memory issues inflicted on your cluster with high-cardinality field values. https://github.com/elastic/elasticsearch/issues/18838. Instead, use a real, reasonable number between 1 to 2147483647. – PhaedrusTheGreek Jul 27 '16 at 16:48
thanks @PhaedrusTheGreek for pointing this out , i have edited the answer to incorporate your comment . – keety Jul 28 '16 at 02:15
1

0 is working on 2.5.2. what do you mean by 2.x onward? do you mean after version 5? I am also curious what kind of memory issues can it cause if I want to return all possible aggs, what would be the difference between setting 0 (max_value) and 10000(Some big upper limit)? – Emil Apr 21 '17 at 12:04
5

@batmaci it was deprecated in [2.x](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-aggregations-bucket-terms-aggregation.html#_size) so would still work and was removed from [5.x](https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_50_aggregations_changes.html) – keety Apr 21 '17 at 15:32
@batmaci I believe the use of size: is not less memory intensive but just makes it more explicit to client that there is performance cost. I think that is the reasoning behind deprecating `size:0`. You can read about it more in this github [issue](https://github.com/elastic/elasticsearch/issues/18838) – keety Apr 21 '17 at 15:36
Any suggestion on how to do this on a date_histogram aggregation @keety ? – Eric Hodonsky Feb 01 '18 at 21:41
this nonsense only works if you only have a single shard per index... or else it requires agreement and still drops off extra buckets :/ – Eric Hodonsky Feb 02 '18 at 01:31
If I set the size to 20,000 it doesn't return any result. :( works with small sizes like 20, 30, 50 though :( – Kishan Mehta Mar 16 '18 at 12:00

kgf3JfUtW · Answer 2 · 2017-12-20T14:59:29.880

60

How to show all buckets?

{
  "size": 0,
  "aggs": {
    "aggregation_name": {
      "terms": {
        "field": "your_field",
        "size": 10000
      }
    }
  }
}

Note

"size":10000 Get at most 10000 buckets. Default is 10.
"size":0 In result, "hits" contains 10 documents by default. We don't need them.
By default, the buckets are ordered by the doc_count in decreasing order.

Why do I get Fielddata is disabled on text fields by default error?

Because fielddata is disabled on text fields by default. If you have not wxplicitly chosen a field type mapping, it has the default dynamic mappings for string fields.

So, instead of writing "field": "your_field" you need to have "field": "your_field.keyword".

edited Dec 20 '17 at 14:59

answered Dec 19 '17 at 23:14

kgf3JfUtW

13,702
10
57
80

Does having a bigger size for buckets, affect the performance (time to run query) of elastic search query? – user3522967 Jan 21 '20 at 11:56
How can we add pagination for the buckets? – Amir Afianian Feb 28 '20 at 10:35
@AmirAfianian the [documentation for the `composite` aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html#_pagination) explains that. – sox supports the mods Nov 20 '20 at 17:50

score 19 · Answer 3 · answered Apr 07 '20 at 16:55

If you want to get all unique values without setting a magic number (size: 10000), then use COMPOSITE AGGREGATION (ES 6.5+).

From official documentation:

"If you want to retrieve all terms or all combinations of terms in a nested terms aggregation you should use the COMPOSITE AGGREGATION which allows to paginate over all possible terms rather than setting a size greater than the cardinality of the field in the terms aggregation. The terms aggregation is meant to return the top terms and does not allow pagination."

Implementation example in JavaScript:

const ITEMS_PER_PAGE = 1000;

const body =  {
    "size": 0, // Returning only aggregation results: https://www.elastic.co/guide/en/elasticsearch/reference/current/returning-only-agg-results.html
    "aggs" : {
        "langs": {
            "composite" : {
                "size": ITEMS_PER_PAGE,
                "sources" : [
                    { "language": { "terms" : { "field": "language" } } }
                ]
            }
        }
     }
};

const uniqueLanguages = [];

while (true) {
  const result = await es.search(body);

  const currentUniqueLangs = result.aggregations.langs.buckets.map(bucket => bucket.key);

  uniqueLanguages.push(...currentUniqueLangs);

  const after = result.aggregations.langs.after_key;

  if (after) {
      // continue paginating unique items
      body.aggs.langs.composite.after = after;
  } else {
      break;
  }
}

console.log(uniqueLanguages);

I saw this in the docs as well. Can composite aggregations be sorted by doc_count though? It seems like they are sorted alphabetically by default — Pwnosaurus, Jun 21 '22 at 19:32

score 8 · Answer 4 · edited Mar 30 '19 at 20:46

Increase the size(2nd size) to 10000 in your term aggregations and you will get the bucket of size 10000. By default it is set to 10. Also if you want to see the search results just make the 1st size to 1, you can see 1 document, since ES does support both searching and aggregation.

curl -XPOST "http://localhost:9200/imoveis/_search?pretty=1" -d'
{
   "size": 1,
   "aggregations": {
      "bairro_count": {
         "terms": {
             "field": "bairro.raw",
             "size": 10000

         }
      }
   }
}'

Show all Elasticsearch aggregation results/buckets and not just 10

4 Answers4

How to show all buckets?

Linked