I have a line by line item list that has a start and an end date. I want to count that item while the date is in between the start and end date (ends inclusive), grouped by category.
So this is my input dataset:
>> df
Key Count Start Count End Category
A Jan 1 2019 Jan 5 2019 Red
B Jan 1 2019 Jan 7 2019 Blue
C Jan 3 2019 Jan 5 2019 Red
D Jan 2 2019 Jan 8 2019 Red
E Jan 4 2019 Jan 10 2019 Yellow
F Jan 3 2019 Jan 6 2019 Blue
G Jan 5 2019 Jan 8 2019 Red
H Jan 6 2019 Jan 10 2019 Yellow
I Jan 1 2019 Jan 4 2019 Yellow
J Jan 2 2019 Jan 7 2019 Red
And I want my output dataset to be something like this:
>> DailyCount
Date Category Count
Jan 1 2019 Red 1
Jan 1 2019 Blue 1
Jan 1 2019 Yellow 1
Jan 2 2019 Red 3
Jan 2 2019 Blue 1
Jan 2 2019 Yellow 1
Jan 3 2019 Red 4
Jan 3 2019 Blue 2
Jan 3 2019 Yellow 1
Jan 4 2019 Red 4
Jan 4 2019 Blue 2
Jan 4 2019 Yellow 2
Jan 5 2019 Red 5
Jan 5 2019 Blue 2
Jan 5 2019 Yellow 1
Jan 6 2019 Red 3
Jan 6 2019 Blue 2
Jan 6 2019 Yellow 2
Jan 7 2019 Red 3
Jan 7 2019 Blue 1
Jan 7 2019 Yellow 2
Jan 8 2019 Red 2
Jan 8 2019 Blue 0
Jan 8 2019 Yellow 2
Jan 9 2019 Red 0
Jan 9 2019 Blue 0
Jan 9 2019 Yellow 2
Jan 10 2019 Red 0
Jan 10 2019 Blue 0
Jan 10 2019 Yellow 2
I used Counter()
to count the occurrences per day, but I'm not sure how to incorporate the grouping by category:
Count = Counter()
for index, row in df.iterrows():
delta = row['Count End'] - row['Count Start']
for i in range(delta.days + 1):
time = row['Count Start'] + timedelta(i)
Count[str(time.date())] += 1
DailyCount = DataFrame.from_dict(Count,orient='index').reset_index().rename(columns={'index':'Date', 0:'Count'}).sort_values(by=['Date'])
>> DailyCount
Date Count
Jan 1 2019 3
Jan 2 2019 5
Jan 3 2019 7
Jan 4 2019 8
Jan 5 2019 8
Jan 6 2019 7
Jan 7 2019 6
Jan 8 2019 4
Jan 9 2019 2
Jan 10 2019 2
Any idea how I can partition this code by category?