Elasticsearch: Aggregate documents based on date range

Question

I have a set of documents in ElasticSearch 5.5 with two date fields: start_date and end_date. I want to aggregate them into date histogram buckets (ex: weekly) such that if the start_date < week X < end_date, then document would be in "week X" bucket. This means that a single document might be in multiple buckets.

Consider the following concrete example: I have a set of documents describing company employees, and for each employee you have hire date and (optionally) termination date. I want to build date histogram of number of active employees for trailing twelve months.

Sample doc content:

{
   "start_date": "2013-01-12T00:00:00.000Z",
   "end_date": "2016-12-08T00:00:00.000Z",
   "id": "123123123"
}

Is there a way to do this in ES?

always support your questions with proper json to show example to usecases. thx — user3775217, Aug 02 '17 at 09:43
Added one, although it doesn't feel like it's adding a lot of value — Tofig Hasanov, Aug 02 '17 at 09:47
its not about value, its about when i read the post i should be able to visualize your problem better and obviously pushing a formatted json does't hurt much. — user3775217, Aug 02 '17 at 13:08
I doubt that one document can appear in multiple buckets of one aggregation. I can suggest checking out [scripting for date aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/_use_of_a_script_to_aggregate_by_day_of_the_week.html), using [pipeline aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html) and to denormalize data, e.g. precompute all the weeks between `start_date` and `end_date` and store them as a list in a field `active_weeks`. — Nikolay Vasiliev, Aug 03 '17 at 10:19
@NikolayVasiliev I have found one way to do that (check my answer), but I am still not happy about this. Perhaps scripting is a better way forward — Tofig Hasanov, Aug 03 '17 at 10:30

score 3 · Answer 1 · answered Aug 03 '17 at 10:29

I have found one way to do this, using filter aggregations ( https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-filter-aggregation.html). If I need, say, 12 trailing months report, then I would create 12 buckets, where each bucket defines filter conditions, such as:

"bool":{
 "must":[{
  "range":{
   "start_date":{
    "lte":"2016-01-01T00:00:00.000Z"
   }
  }
 },{
 {
  "range":{
   "end_date":{
    "gt":"2016-02-01T00:00:00.000Z"
   }
  }
 }]
}

However, I feel that it would be nice if there was an easier way to do this, since if I want say trailing 365 days, that means I have to create 365 bucket filters, which makes resultant query very large.

score -1 · Answer 2 · answered Jun 11 '19 at 11:08

-1

I know this question is quite old but as it's still open I am sharing my knowledge on this. Also this question does not clearly explains that what kind of output is expected but still I think this can be achieved using the "Date Histogram Aggregation" and "Bucket Script Aggregation".

Here are the documentation links for both of these aggregations.

https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-bucket-datehistogram-aggregation.html

https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-pipeline-bucket-script-aggregation.html

answered Jun 11 '19 at 11:08

Surbhi Harsh

69
9

Please add summery of your links content to your answer – mastisa Jun 11 '19 at 11:28
The links should be able to describe you everything. And if you want me to provide some sample query around it then I would need the expected result format. – Surbhi Harsh Jun 11 '19 at 13:36

Elasticsearch: Aggregate documents based on date range

2 Answers2