Context: WiFi DHCP leases recorded in syslog.
A DHCP lease is for 24 hours and a record represents a successful WiFi "session". Some devices like phones will initiate multiple sessions per day so we need to count only unique mac addresses per 24 hours. But we want to know the timestamp of first connection. Finally, we need to be able to sum by hour and day.
TL;DR: Need to show daily unique mac addresses broken down by hour. Not unique by the hour, but by day ...then broken down by hour and summed.
Sample data frame:
branch timestamp mac
0 branch_a 2020-09-01 00:00:00 48:c7:96:1d:91:af
1 branch_a 2020-09-01 00:08:00 48:c7:96:1d:91:bx
2 branch_b 2020-09-01 00:36:07 48:c7:96:1d:80:ff
3 branch_b 2020-09-01 00:41:24 48:c7:96:1d:86:ff
4 branch_c 2020-09-01 00:44:33 48:c7:96:1d:76:bv
Steps:
- group by branch
- First occurance or unique mac addresses by day
- Sum mac addresses by hour
This shows the same mac.
branch_daily = wifi.groupby(['branch','month', 'timestamp'])['mac'].first()
Intended result:
branch timestamp mac
0 branch_a 2020-09-01 00:00:00 5
1 branch_a 2020-09-01 00:01:00 10
2 branch_a 2020-09-01 00:02:00 3
3 branch_a 2020-09-01 00:03:00 4
4 branch_a 2020-09-01 00:04:00 11
Where mac is the sum by hour.
wifi['timestamp'] = pd.to_datetime(wifi['timestamp'], format='%b %d %Y %H:%M:%S')
wifi['month'] = wifi['timestamp'].dt.month
wifi['day'] = wifi['timestamp'].dt.day
wifi['hour'] = wifi['timestamp'].dt.hour
uniq_per_day = wifi.drop_duplicates(subset=['day','mac'], keep='first')
# Hourly
uniq_per_day.groupby(['branch','month','day','hour']).agg({'mac':'count'})
# Daily
uniq_per_day.groupby(['branch','month','day']).agg({'mac':'count'})
#...etc.