I have data records for different entities, and for each entity some count recorded in a specific hour during the day for a whole month. For example:
entity_id time counts
0 175 2019-03-01 05:00:00 3
1 175 2019-03-01 06:00:00 4
2 175 2019-03-01 07:00:00 6
3 175 2019-03-01 08:00:00 6
4 175 2019-03-01 09:00:00 7
5 178 2019-03-01 05:00:00 8
6 178 2019-03-01 06:00:00 4
7 178 2019-03-01 07:00:00 5
8 178 2019-03-01 08:00:00 6
9 200 2019-03-01 05:00:00 7
10 200 2019-03-01 08:00:00 3
11 175 2019-03-03 05:00:00 3
12 175 2019-03-03 07:00:00 6
13 175 2019-03-03 08:00:00 6
14 175 2019-03-03 09:00:00 7
15 178 2019-03-03 05:00:00 8
16 178 2019-03-03 06:00:00 4
17 178 2019-03-03 07:00:00 5
18 178 2019-03-03 08:00:00 6
19 200 2019-03-03 05:00:00 7
20 200 2019-03-03 08:00:00 3
21 200 2019-03-03 09:00:00 7
...
I want to be able to aggregate for each entity the mean of the counts in several ranges of hours in different days of the week throughout the month. E.g.:
- The mean for Morning (6-10AM) on Sundays
- The mean for Morning (6-10AM) on Sundays-Thursdays
- The mean for Noon (11AM-1PM) on Sundays-Thursdays
- The mean for Noon (11AM-1PM) on Fri-Sat
- The mean for Evening (6PM-9PM) on Fri
- etc.
So I wish to get a df like this (partial example):
entity_id day_in_week time_in_day counts_mean
0 175 sun eve 5
1 175 sun-thu noon 6
2 178 sun eve 5
3 178 sat eve 5
4 200 sun-thu morning 2
...
I managed to get this partially done by iterating over the data, slicing and extracting different elements, but I assume there's a much more efficient way.
I started with this issue, but I still had too many for loops. Any ideas how to optimize the performance?