Elasticsearch - calculate percentage in nested aggregations in relation to parent bucket

Question

Updated question

In my query I aggregate on date and then on sensor name. It is possible to calculate a ratio from a nested aggregation and the total count of documents (or any other aggregation) of the parent bucket? Example query:

{
  "size": 0,
  "aggs": {
    "over_time": {
      "aggs": {
        "by_date": {
          "date_histogram": {
            "field": "date",
            "interval": "1d",
            "min_doc_count": 0
          },
          "aggs": {
            "measure_count": {
              "cardinality": {
                "field": "date"
              }
            },
            "all_count": {
              "value_count": {
                "field": "name"
              }
            },
            "by_name": {
              "terms": {
                "field": "name",
                "size": 0
              },
              "aggs": {
                "count_by_name": {
                  "value_count": {
                    "field": "name"
                  }
                },
                "my ratio": count_by_name / all_count * 100 <-- How to do that?
              }
            }
          }
        }
      }
    }
  }
}

I want a custom metric that gives me the ratio count_by_name / all_count * 100. Is that possible in ES, or do I have to compute that on the client? This seems very simple to me, but I haven't found a way yet.

Old post:

Is there a way to let Elasticsearch consider the overall count of documents (or any other metric) when calculating the average for a bucket?

Example: I have like 100000 sensors that generate events on different times. Every event is indexed as a document that has a timestamp and a value.

When I want to calculate a ratio of the value and a date histogram, and some sensors only generated values at one time, I want Elasticsearch to treat the not existing values(documents) for my sensors as 0 instead of null. So when aggregating by day and a sensor only has generated two values at 10pm (3) and 11pm (5), the aggregate for the day should be (3+5)/24, or formal: SUM(VALUE)/24.

Instead, Elasticsearch calculates the average like (3+5)/2, which is not correct in my case.

There was once a ticket on Github https://github.com/elastic/elasticsearch/issues/9745, but the answer was "handle it in your application". That's no answer for me, as I would have to generate zillions of zero-Value documents for every sensor/time combination to get the average ratio right.

Any ideas on this?

Can you provide a sample document and the aggregation request you sent — Vineeth Mohan, Nov 06 '15 at 03:24

Vineeth Mohan · Answer 1 · 2015-11-07T02:16:01.003

0

If this is the case , simply divide the results by 24 from application side.And when granularity change , change this value accordingly. Number of hours per day is fixed right ....

edited Nov 07 '15 at 02:16

answered Nov 06 '15 at 14:54

Vineeth Mohan

18,633
8
63
77

Every document has the required fields, that's not the problem. My problem is calculation of the ratio between nested aggregations and their parents. – static-max Nov 06 '15 at 19:52

score 0 · Answer 2 · answered Aug 24 '20 at 11:07

0

You can use the Bucket script aggregation to do what you want.

{
    "bucket_script": {
        "buckets_path": {
            "count_by_name": "count_by_name", 
            "all_count": "all_count"
        },
        "script": "count_by_name / all_count*100"
    }
}

It's just an example.

https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-aggregations-pipeline-bucket-script-aggregation.html

answered Aug 24 '20 at 11:07

Jaycreation

2,029
1
15
30

@static-max did my answer helps you? if so, please don't forget to validate it – Jaycreation Sep 15 '20 at 09:45

Elasticsearch - calculate percentage in nested aggregations in relation to parent bucket

2 Answers2