6

I need to do get some data grouped by day of week and hour, for example

curl -XGET http://localhost:9200/testing/hello/_search?pretty=true -d '
{
        "size": 0,
        "aggs": {
          "articles_over_time" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "hour",
                "format": "E - k"
            }
          }
        }
}
'

Gives me this:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2857,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "articles_over_time" : {
      "buckets" : [ {
        "key_as_string" : "Fri - 17",
        "key" : 1391792400000,
        "doc_count" : 6
      },
     ...
      {
        "key_as_string" : "Wed - 22",
        "key" : 1411596000000,
        "doc_count" : 1
      }, {
        "key_as_string" : "Wed - 22",
        "key" : 1411632000000,
        "doc_count" : 1
      } ]
    }
  }
}

Now I need to summarize doc counts by this value "Wed - 22", how can I do this? Maybe some another approach?

MindMincer
  • 81
  • 1
  • 6

3 Answers3

1

The same kind of problem has been solved in this thread.

Adapting the solution to your problem, we need to make a script to convert the date into the hour of day and day of week:

Date date = new Date(doc['date'].value) ; 
java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('EEE, HH');
format.format(date)

And use it in a query:

{
    "aggs": {
        "perWeekDay": {
            "terms": {
                "script": "Date date = new Date(doc['date'].value) ;java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('EEE, HH');format.format(date)"
            }
        }
    }
}
Community
  • 1
  • 1
Heschoon
  • 2,915
  • 9
  • 26
  • 55
0

You can try doing terms aggregation on "key_as_string" field from the aggregation results using sub aggregation.

Hope that helps.

Natzy8
  • 15
  • 3
0

This is because you are using an interval of 'hour', but, the date format is 'day' (E - k).

Change your interval to be 'day', and you'll no longer get separate buckets for 'Weds - 22'.

Or, if you do want per hour, then change your format to include the hour field.

RichS
  • 3,097
  • 30
  • 26
  • One drawback of the non-script solution is that you get multiple values per hour if the query spans multiple days, my current solution is to combine this on the client-side as I would like to avoid having to open up scripting just for this. – centic Apr 27 '16 at 12:20
  • Not sure I follow your comment :-). The interval and the format are related - the interval controls the bucketing, the format controls the names of the bucket. If you want the bucket names to all be unique, you have to set the format appropriately. ?! Alternatively, ignore the 'key_as_string' and use the 'key' instead ;-). – RichS Apr 27 '16 at 14:29
  • My goal is to get a "hour of the day" historgram for a longer period of time. It should show how many things happened at 1am, at 2am, at 3am. Now if I set interval: "hour" and format: "HH", I get the expected hourly buckets with key_as_string "01", "02", ..., however they are repeated for every day, so I get multiple "01", and multiple "02", ... – centic Apr 27 '16 at 18:50
  • Ok, so, if you want to consolidate all 2am (across multiple days) into the same bucket, then you can get that from my other answer ;-) - http://stackoverflow.com/questions/29002152/how-to-show-day-names-using-date-histogram-aggregation-in-elascticsearch/31851896#31851896 – RichS Apr 29 '16 at 13:26
  • Yes, thanks, I saw that. But without script would be better as I want to run this on rather large sets of data and also would like to avoid the security implications of allowing scripts... For now I am doing the summing-up on the client-side which seems to work reasonably well. – centic Apr 29 '16 at 13:30
  • 1
    Ah, ok. I understand. Neither is amazing, but at least your approach reduces the cost of the query. Good luck :-). – RichS Apr 29 '16 at 13:39