21

I am learning elastic search and would like to count distinct values. So far I can count values but not distinct.

Here is the sample data:

curl http://localhost:9200/store/item/ -XPOST -d '{
  "RestaurantId": 2,
  "RestaurantName": "Restaurant Brian",
  "DateTime": "2013-08-16T15:13:47.4833748+01:00"
}'

curl http://localhost:9200/store/item/ -XPOST -d '{
  "RestaurantId": 1,
  "RestaurantName": "Restaurant Cecil",
  "DateTime": "2013-08-16T15:13:47.4833748+01:00"
}'

curl http://localhost:9200/store/item/ -XPOST -d '{
  "RestaurantId": 1,
  "RestaurantName": "Restaurant Cecil",
  "DateTime": "2013-08-16T15:13:47.4833748+01:00"
}'

And what I tried so far:

curl -XPOST "http://localhost:9200/store/item/_search" -d '{
  "size": 0,
  "aggs": {
    "item": {
      "terms": {
        "field": "RestaurantName"
      }
    }
  }
}'

Output:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.0,
    "hits": []
  },
  "aggregations": {
    "item": {
      "buckets": [
        {
          "key": "restaurant",
          "doc_count": 3
        },
        {
          "key": "cecil",
          "doc_count": 2
        },
        {
          "key": "brian",
          "doc_count": 1
        }
      ]
    }
  }
}

How can I get count of cecil as 1 instead of 2

Saeed Zhiany
  • 2,051
  • 9
  • 30
  • 41
Developer
  • 817
  • 2
  • 16
  • 28

5 Answers5

14

You have to use cardinality option as mentioned by @coder that you can find in the doc

$ curl -XGET "http://localhost:9200/store/item/_search" -d'
{
"aggs" : {
    "restaurant_count" : {
        "cardinality" : {
            "field" : "RestaurantName",
            "precision_threshold": 100, 
            "rehash": false 
            }
          }
         }
}'

This worked for me ...

slm
  • 15,396
  • 12
  • 109
  • 124
c24b
  • 5,278
  • 6
  • 27
  • 35
  • 4
    As correctly pointed by @c24b, `cardinality` serves the purpose here, but I would like to point out a few things here: 1. `cardinality` aggregation is an "approximate" algorithm based on [HyperLogLog++ (HLL)][http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40671.pdf] algorithm. Quoting from documentation: `HLL works by hashing your input and using the bits from the hash to make probabilistic estimations on the cardinality`. There is a trade-off between "precision" and "memory". – nishant kumar Aug 26 '19 at 13:26
  • For more details read here: https://www.elastic.co/guide/en/elasticsearch/guide/current/cardinality.html My apologies for citing the link as I was not able to explain more due to space constraints. – nishant kumar Aug 26 '19 at 13:39
  • 3
    Got this error with rehash option: `[7:23] [cardinality] rehash doesn't support values of type: VALUE_BOOLEAN` – Jerry Chin Jul 07 '22 at 08:14
5

Use could use cardinality here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html

coder
  • 1,901
  • 5
  • 29
  • 44
3

It is too late for me to answer this question for the original Author, but for anybody who is facing the same issue and reached here, my answer might help.

ES provides Cardinality for sure to get distinct count, but it is not accurate. For accuracy, a proper solution can be used. I have written an article on this which might help : Accurate Distinct Count and Values from Elasticsearch.

0

There's no support for distinct counting in ElasticSearch, although non-deterministic counting exists. Use "terms" aggregation and count buckets in result. See Count distinct on elastic search question.

asu
  • 539
  • 6
  • 15
0

Use Cardinality Feature: Docs : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html

Example :

 "aggs": {
                "unquieValues": {
                  "cardinality": {
                    "field": "ourUniqueId.keyword",
                    "precision_threshold": 100
                  }
                }
              }