2

I have a line by line item list that has a start and an end date. I want to count that item while the date is in between the start and end date (ends inclusive), grouped by category.

So this is my input dataset:

>> df
Key     Count Start     Count End     Category
A       Jan 1 2019      Jan 5 2019    Red
B       Jan 1 2019      Jan 7 2019    Blue
C       Jan 3 2019      Jan 5 2019    Red
D       Jan 2 2019      Jan 8 2019    Red
E       Jan 4 2019      Jan 10 2019   Yellow
F       Jan 3 2019      Jan 6 2019    Blue
G       Jan 5 2019      Jan 8 2019    Red
H       Jan 6 2019      Jan 10 2019   Yellow
I       Jan 1 2019      Jan 4 2019    Yellow
J       Jan 2 2019      Jan 7 2019    Red

And I want my output dataset to be something like this:

>> DailyCount
Date          Category          Count
Jan 1 2019    Red               1
Jan 1 2019    Blue              1
Jan 1 2019    Yellow            1
Jan 2 2019    Red               3
Jan 2 2019    Blue              1
Jan 2 2019    Yellow            1
Jan 3 2019    Red               4
Jan 3 2019    Blue              2
Jan 3 2019    Yellow            1
Jan 4 2019    Red               4
Jan 4 2019    Blue              2
Jan 4 2019    Yellow            2
Jan 5 2019    Red               5
Jan 5 2019    Blue              2
Jan 5 2019    Yellow            1
Jan 6 2019    Red               3
Jan 6 2019    Blue              2
Jan 6 2019    Yellow            2
Jan 7 2019    Red               3
Jan 7 2019    Blue              1
Jan 7 2019    Yellow            2
Jan 8 2019    Red               2
Jan 8 2019    Blue              0
Jan 8 2019    Yellow            2
Jan 9 2019    Red               0
Jan 9 2019    Blue              0
Jan 9 2019    Yellow            2
Jan 10 2019   Red               0
Jan 10 2019   Blue              0
Jan 10 2019   Yellow            2

I used Counter() to count the occurrences per day, but I'm not sure how to incorporate the grouping by category:

Count = Counter()

for index, row in df.iterrows():
  delta = row['Count End'] - row['Count Start']
  for i in range(delta.days + 1):
    time = row['Count Start'] + timedelta(i)
    Count[str(time.date())] += 1

DailyCount = DataFrame.from_dict(Count,orient='index').reset_index().rename(columns={'index':'Date', 0:'Count'}).sort_values(by=['Date'])

>> DailyCount
Date          Count
Jan 1 2019    3
Jan 2 2019    5
Jan 3 2019    7
Jan 4 2019    8
Jan 5 2019    8
Jan 6 2019    7
Jan 7 2019    6
Jan 8 2019    4
Jan 9 2019    2
Jan 10 2019   2

Any idea how I can partition this code by category?

ChrisGPT was on strike
  • 127,765
  • 105
  • 273
  • 257
nclmteo
  • 37
  • 1
  • 3

2 Answers2

0

Using unnesting after create the list of dates

df['Count Start']=pd.to_datetime(df['Count Start'])
df['Count End']=pd.to_datetime(df['Count End'])

df['Date']=[pd.date_range(x,y) for x , y in zip(df['Count Start'],df['Count End'])]
#Here we just need combine with `groupby` with `size` and adjust by using `unstack` and `stack`
Yourdf=unnesting(df,['Date']).groupby(['Date','Category']).size().unstack(fill_value=0).stack()
Yourdf

         Date Category  0
0  2019-01-01     Blue  1
1  2019-01-01      Red  1
2  2019-01-01   Yellow  1
3  2019-01-02     Blue  1
4  2019-01-02      Red  3
5  2019-01-02   Yellow  1
6  2019-01-03     Blue  2
7  2019-01-03      Red  4
8  2019-01-03   Yellow  1
9  2019-01-04     Blue  2
10 2019-01-04      Red  4
11 2019-01-04   Yellow  2
12 2019-01-05     Blue  2
13 2019-01-05      Red  5
14 2019-01-05   Yellow  1
15 2019-01-06     Blue  2
16 2019-01-06      Red  3
17 2019-01-06   Yellow  2
18 2019-01-07     Blue  1
19 2019-01-07      Red  3
20 2019-01-07   Yellow  2
21 2019-01-08     Blue  0
22 2019-01-08      Red  2
23 2019-01-08   Yellow  2
24 2019-01-09     Blue  0
25 2019-01-09      Red  0
26 2019-01-09   Yellow  2
27 2019-01-10     Blue  0
28 2019-01-10      Red  0
29 2019-01-10   Yellow  2

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')
BENY
  • 317,841
  • 20
  • 164
  • 234
0

You can use the:

pandas.DataFrame.groupby()

function, tha should work for you.

You can check more on this function: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html