1

I have file containing data from ubereats orders. I want to know the average orders per hour per day. The file contains order data from 52 weeks. I managed to get the data per hour per mondays, tuesdays etc. But did not manage yet to get the average order per hour on monday, tuesday etc.

search_body = {
    "size": 0,
    "aggs": {
        "total_orders_per_day_of_week": {
            "terms": {
                "script": {
                    "lang": "painless",
                    "source": "doc['dateTime'].value.dayOfWeek"
                }
            },
        "aggs": {
            "countPerHour": {
                "terms": {
                    "size": 24,
                    "script": "return doc['dateTime'].value.getHour();"
                }
            }
         }
      }
    }
}

Maybe I need to do something like I did to calculate the average order per day?

search_body = {
    "size" : 0,
    "aggs" : {
        "orders_per_day" : {
            "terms" : {
                "script" : {
                    "lang": "painless",
                    "source": "doc['dateTime'].value.dayOfWeek"
                }
            },
            "aggs" : {
                "numberOfOccurrences" : {
                    "cardinality": {
                        "script" : {
                            "lang": "painless",
                            "source": "doc['dateTime'].value.dayOfYear"
                        }
                    }
                }
            },
        }
    }
}
for day in result["aggregations"]["orders_per_day"]["buckets"]:
    print (day["key"], day["doc_count"], day["numberOfOccurrences"]["value"],
        day["doc_count"]/day["numberOfOccurrences"]["value"])
Daniel Walker
  • 6,380
  • 5
  • 22
  • 45
GdGd
  • 11
  • 2
  • Have you tried Date histogram aggregation? It can aggregate by calendar intervals. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html#calendar_intervals – Frank Oct 23 '20 at 19:06
  • Yes I did, this is one of the options from Date histogram aggregation, see at the end of the page from your link. Maybe I did not explain it well, when I do the normal date histogram aggr. I get it per day from al the seprate days or all the hours per day. But I would like to have the average of monday, tuesday. So that why I have first it summed together and now need some dividing over the 52 weeks – GdGd Oct 23 '20 at 19:52

1 Answers1

1

If I understand correctly this time, you want to get average of every hour of Monday, Tuesday... So there should be 24x7 results. If so, do this:

{
  "size": 0,
  "aggs": {
    "orders_per_hour_of_day": {
      "terms": {
        "script" : {
            "lang": "painless",
            "source": "doc['dateTime'].value.getHour()"
        },
        "size": 24
      },
      "aggs": {
        "total_orders_per_day_of_week": {
          "terms": {
            "script": {
              "lang": "painless",
              "source": "doc['dateTime'].value.getDayOfWeekEnum()"
            }
          },
          "aggs": {
            "number_of_weeks": {
              "date_histogram": {
                "field": "dateTime",
                "calendar_interval": "week"
              }
            },
            "average_orders_per_hour_day_of_week": {
              "bucket_script": {
                "buckets_path": {
                  "doc_count": "_count",
                  "number_of_weeks": "number_of_weeks._bucket_count"
                },
                "script": "params.doc_count / params.number_of_weeks"
              }
            }
          }
        }
      }
    }
  }
}

The code adapted from here: https://discuss.elastic.co/t/average-per-day-of-week-aggregation/124132


I'm not sure if I understand your question clearly. It sounds like you want to get the average orders of Monday, Tuesday, ... Sunday. If that's the question, you could just divide 24 hours of each bucket.

{
  "size": 0,
  "aggs": {
    "orders_per_day": {
      "terms": {
        "script": {
          "lang": "painless",
          "source": "doc['dateTime'].value.dayOfWeek"
        }
      },
      "aggs": {
        "avg_per_hour": {
          "bucket_script": {
            "buckets_path": {
              "count": "_count"
            },
            "script": "params.count / 24"
          }
        }
      }
    }
  }
}

Bucket aggregation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-bucket-script-aggregation.html
Path of bucket doc_count:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html#_special_paths

Frank
  • 1,215
  • 12
  • 24
  • I think i have the same assignment, at the same university (but i don't now this fellow student? in person...). There are multiple order per day, and each order as a new row assigned to it. As only have date-time in a single colmn, formatted like yyyy-mm-dd hh:mm:ss, and we dont have a dedicated column for the day of the week, i do nog think that bucket aggregations will work. I'm concidering to use the ".weekday() " function to get the day of the week. but not sure if this can be combined also see: https://stackoverflow.com/questions/9847213/how-do-i-get-the-day-of-week-given-a-date – Cornelis Oct 24 '20 at 21:15
  • Yes, I think we have the same assignment. I did manage to get it per mondays, tuesdays etc per hour. However, this is the total of all mondays between 9-10, 10-11. I also now that there are 53 mondays and the rest 52. But how to get this better, I have no idea. Maybe we can help eachother? – GdGd Oct 24 '20 at 21:29
  • I just updated the code. Hope I understand it this time.. – Frank Oct 25 '20 at 05:16