2

i'm receiving requests/events from a large number of client applications. i'd like to use elasticsearch to find out when my highest traffic point is.

one thing i've tried is a filter aggregation with a nested histogram and then a nested "terms" aggregation that gets the distinct hour of the day via a script field. the following is my attempt, and it performs terribly (as I'd expect since I'm executing a script per document).

{
  "aggs": {
    "sites_within_range": {
      "filter" : { 
        "range" : { 
          "occurred" : { 
            "gt" : "now-1M"
          }
        } 
      },

      "aggs": {
        "sites_over_time": {
          "date_histogram": {
            "field": "occurred",
            "interval": "week"
          },
          "aggs":{
            "site_names": {
              "terms": {
                "script": "doc['occurred'].date.getHourOfDay()",
                "size": 10000
              }
            }
          }
        }
      }

    }
  }
}

I've also considered storing the date elements i want to query as distinct parts of the document, eg:

{
    "date": "actual datetime",
    "day": "monday",
    "hour": 8
    "minute": 37
}

this also smells like the wrong answer to me.


<edit> after some investigation, looks like I might be interested in the new cardinality / percents aggregations coming in 1.1?

Joshua Evensen
  • 1,544
  • 1
  • 15
  • 33

1 Answers1

1

The same kind of problem has been solved in this thread.

Adapting the solution to your problem, we need to make a script to convert the date into the hour of day:

Date date = new Date(doc['created_at'].value) ; 
java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('HH');
format.format(date)

And use it in a query:

{
    "aggs": {
        "perWeekDay": {
            "filter" : { 
                "range" : { 
                    "occurred" : { 
                        "gt" : "now-1M"
                    }
                } 
            },
            "aggs": {
                "terms": {
                    "script": "Date date = new Date(doc['created_at'].value) ;java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('HH');format.format(date)"
            }
        }
    }
}

And you have the traffic by hour of day.

Nota bene: Storing the hours/days/minutes in your document is the most efficient way of doing that kind of aggregation. My answer assumes you don't want to store that information. Scripts usually aren't über efficent.

Community
  • 1
  • 1
Heschoon
  • 2,915
  • 9
  • 26
  • 55
  • 1
    Doesn't look very efficient to do that for each document for each aggregation. I'd go for storing the hour as dedicated field. – Sebastian Apr 15 '16 at 14:14
  • Hi @Sebastian! Storing the hours/days/minutes is indeed the most efficient way of doing that kind of aggregation. My answer assumes you don't want to store that information into your document but I'll edit the answer to reflect this assumption. – Heschoon Apr 15 '16 at 15:19