3

I have a set of documents in ElasticSearch 5.5 with two date fields: start_date and end_date. I want to aggregate them into date histogram buckets (ex: weekly) such that if the start_date < week X < end_date, then document would be in "week X" bucket. This means that a single document might be in multiple buckets.

Consider the following concrete example: I have a set of documents describing company employees, and for each employee you have hire date and (optionally) termination date. I want to build date histogram of number of active employees for trailing twelve months.

Sample doc content:

{
   "start_date": "2013-01-12T00:00:00.000Z",
   "end_date": "2016-12-08T00:00:00.000Z",
   "id": "123123123"
}

Is there a way to do this in ES?

user3775217
  • 4,675
  • 1
  • 22
  • 33
Tofig Hasanov
  • 3,303
  • 10
  • 51
  • 81
  • always support your questions with proper json to show example to usecases. thx – user3775217 Aug 02 '17 at 09:43
  • Added one, although it doesn't feel like it's adding a lot of value – Tofig Hasanov Aug 02 '17 at 09:47
  • 1
    its not about value, its about when i read the post i should be able to visualize your problem better and obviously pushing a formatted json does't hurt much. – user3775217 Aug 02 '17 at 13:08
  • I doubt that one document can appear in multiple buckets of one aggregation. I can suggest checking out [scripting for date aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/_use_of_a_script_to_aggregate_by_day_of_the_week.html), using [pipeline aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html) and to denormalize data, e.g. precompute all the weeks between `start_date` and `end_date` and store them as a list in a field `active_weeks`. – Nikolay Vasiliev Aug 03 '17 at 10:19
  • @NikolayVasiliev I have found one way to do that (check my answer), but I am still not happy about this. Perhaps scripting is a better way forward – Tofig Hasanov Aug 03 '17 at 10:30

2 Answers2

3

I have found one way to do this, using filter aggregations ( https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-filter-aggregation.html). If I need, say, 12 trailing months report, then I would create 12 buckets, where each bucket defines filter conditions, such as:

"bool":{
 "must":[{
  "range":{
   "start_date":{
    "lte":"2016-01-01T00:00:00.000Z"
   }
  }
 },{
 {
  "range":{
   "end_date":{
    "gt":"2016-02-01T00:00:00.000Z"
   }
  }
 }]
}

However, I feel that it would be nice if there was an easier way to do this, since if I want say trailing 365 days, that means I have to create 365 bucket filters, which makes resultant query very large.

Tofig Hasanov
  • 3,303
  • 10
  • 51
  • 81
-1

I know this question is quite old but as it's still open I am sharing my knowledge on this. Also this question does not clearly explains that what kind of output is expected but still I think this can be achieved using the "Date Histogram Aggregation" and "Bucket Script Aggregation".

Here are the documentation links for both of these aggregations.

https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-bucket-datehistogram-aggregation.html

https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-pipeline-bucket-script-aggregation.html

  • Please add summery of your links content to your answer – mastisa Jun 11 '19 at 11:28
  • The links should be able to describe you everything. And if you want me to provide some sample query around it then I would need the expected result format. – Surbhi Harsh Jun 11 '19 at 13:36