Using Elasticsearch Date Histogram Aggregations to Count Dates in Array Properties

Question

I have an elasticsearch index with the following document:

{
  dates: ["2014-01-31","2014-02-01"]
}

I want to count all the instances of all the days in my index separated by year and month. I hoped to do this using a date histogram aggregation (which is successful for counting non-array properties):

{
  "from": 0,
  "size": 0,
  "aggregations": {
    "year": {
      "date_histogram": {
        "field": "dates",
        "interval": "1y",
        "format": "yyyy"
      },
      "aggregations": {
        "month": {
          "date_histogram": {
            "field": "dates",
            "interval": "1M",
            "format": "M"
          },
          "aggregations": {
            "day": {
              "date_histogram": {
                "field": "dates",
                "interval": "1d",
                "format": "d"
              }
            }
          }
        }
      }
    }
  }
}

However, I get the following aggregation results:

"aggregations": {
  "year": {
     "buckets": [
        {
           "key_as_string": "2014",
           "key": 1388534400000,
           "doc_count": 1,
           "month": {
              "buckets": [
                 {
                    "key_as_string": "1",
                    "key": 1388534400000,
                    "doc_count": 1,
                    "day": {
                       "buckets": [
                          {
                             "key_as_string": "31",
                             "key": 1391126400000,
                             "doc_count": 1
                          },
                          {
                             "key_as_string": "1",
                             "key": 1391212800000,
                             "doc_count": 1
                          }
                       ]
                    }
                 },
                 {
                    "key_as_string": "2",
                    "key": 1391212800000,
                    "doc_count": 1,
                    "day": {
                       "buckets": [
                          {
                             "key_as_string": "31",
                             "key": 1391126400000,
                             "doc_count": 1
                          },
                          {
                             "key_as_string": "1",
                             "key": 1391212800000,
                             "doc_count": 1
                          }
                       ]
                    }
                 }
              ]
           }
        }
     ]
  }
}

The "day" aggregation ignores the bucket of its parent "month" aggregation, so it processes both elements of the array in each bucket, counting each date twice. The results indicate that two dates appear in each month (and four total), which is obviously incorrect.

I've tried reducing my aggregation to a single date histogram (and bucketing the results in java based on the key) but the doc_count returns as one instead of the number of elements in the array (two in my example). Adding a value_count brings me back to my original issue in which documents that overlap multiple buckets have their dates double-counted.

Is there a way to add a filter to the date histogram aggregations or otherwise modify them in order to count the elements in my date arrays correctly? Alternatively, does Elasticsearch have an option to unwind arrays like in MongoDB? I want to avoid using scripting due to security concerns.

Thanks,

Thomas

Have you found a solution to this problem? – Dmitry Polushkin Oct 02 '16 at 17:02 — Dmitry Polushkin, Oct 02 '16 at 17:02
No, I have not found a solution. – Thomas Schellenberg Oct 04 '16 at 23:44 — Thomas Schellenberg, Oct 04 '16 at 23:44

Using Elasticsearch Date Histogram Aggregations to Count Dates in Array Properties

0 Answers0