5

I have a number of documents representing events with starts_at and ends_at fields. At a given point in time, an event is considered active, if the point in question is after starts_at and before ends_at.

I'm looking for an aggregation, which should result in a date histogram, where each bucket contains the number of active events in that interval.

So far, the best approximation I have found is to create a set of buckets counting the number of starts in each interval, as well as a corresponding set of buckets counting the number of ends, and then postprocessing them by subtracting the number of starts from the number of ends for each interval:

{
  "size": "0",
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "and": [
          {
            "term": {
              "_type": "event"
            }
          },
          {
            "range": {
              "starts_at": {
                "gte": "2015-06-14T05:25:03Z",
                "lte": "2015-06-21T05:25:03Z"
              }
            }
          }
        ]
      }
    }
  },
  "aggs": {
    "starts": {
      "date_histogram": {
        "field": "starts_at",
        "interval": "15m",
        "extended_bounds": {
          "max": "2015-06-21T05:25:04Z",
          "min": "2015-06-14T05:25:04Z"
        },
        "min_doc_count": 0
      }
    },
    "ends": {
      "date_histogram": {
        "field": "ends_at",
        "interval": "15m",
        "extended_bounds": {
          "max": "2015-06-21T05:25:04Z",
          "min": "2015-06-14T05:25:04Z"
        },
        "min_doc_count": 0
      }
    }
  }
}

I'm looking for something like this solution.

Is there a way to achieve that with a single query?

Community
  • 1
  • 1
Peter Hübel
  • 196
  • 5

1 Answers1

0

I'm not 100% sure but up-coming pipeline aggregations might solve this problem in near-future in a more elegant way.

Meanwhile you could choose the desired time resolution and at index time in addition to starts_at and ends_at fields you would also generate active_at field. It would be an array of time stamps and you could use either terms (if it is mapped as not_analyzed string) or date_histogram aggregation to get the correct "active events count" for each time-bucket.

The down-side is inflated storage requirements and possibly worse performance since there are more field values to aggregate over. Anyway it shouldn't be too bad if you don't choose a too high time resolution like 1 minute.

NikoNyrh
  • 3,578
  • 2
  • 18
  • 32