How to achieve count distinct function on elastic search type using sql4es driver?
Select distinct inv_number , count(1) from invoices;
But it returns the total count of the particular invoice number.
How to achieve count distinct function on elastic search type using sql4es driver?
Select distinct inv_number , count(1) from invoices;
But it returns the total count of the particular invoice number.
{
"size": 0,
"aggs": {
"total_invoices": {
"terms": {
"field": "inv_number"
},
"aggs": {
"unique_invoiceid": {
"cardinality": {
"field": "inv_number"
}
}
}
}
}
This will give you the invoice number as key and distict value in unique_invoiceid
Elasticsearch doesn't support deterministic DISTINCT counts (source). It supports only approximate distinct counters like "cardinality". One way to count distincts in a deterministic way is to aggregate them using "terms" and count buckets from result.
Since, the OP is using sql4es jdbc driver, he is asking for a sql query for his use-case :
SELECT COUNT(DISTINCT inv_number) from invoices;
it returns the number of distinct values of the specified column
This should work to count exact distinct values:
curl -X POST "localhost:9200/invoices/_search?size=0&pretty" -H 'Content-Type: application/json' -d '{
"aggs" : {
"types_count" : {
"value_count" : { "field" : "inv_number" }
},
"group_by_status": {
"terms": {
"field": "inv_number"
}
}
}
}'
Cardinality works if your dataset uniqueness/fragmentation is less than 2000. If you really care for accuracy instead of retrieval speed and memory utilization then you can go for scripted metric which will give you accurate results.
https://pratik-patil.medium.com/accurate-distinct-count-and-values-from-elasticsearch-dadce51ad4a7