How do I groupby with one key column, using condition on 2nd column and adding third column separately for weekday and weekend?

Question

My dataframe has 3 columns. ID, Day Type (Monday = 0, Sunday = 6), and quantity.

I want to groupby in a way that for each 'Id' I would have 2 rows, one with aggregate quantity of weekdays (M-F) and one for weekend.

Id  Qty  DayType
B07 581  0
B07 1096 1
B07 1245 2
B07 1080 3
B07 856  4
B07 581  5
B07 1096 6
B08 1345 0
B08 1580 3
B08 816  4

Expected output

Id  Qty  DayType
B07 4858 0
B07 1677 1
B08 3741 0

The 0 in the DayType in the output indicates weekday and 1 indicates weekend.

Also, if you can suggest how to pivot it later, that will be even much helpful

id weekday weekend
B07 4858    1677
B08 3741    0

I found a few solutions to groupby based on a condition but in this condition there are two conditions which I cannot figure out how to implement. Thank you.

Based on the answer suggested by Ansev, I am getting following output -

Id  Weekdays  Weekend
B07     20817     3607
C03     22289     6569
F05     22807     6474

That is correct.

But when I try non-pivot format based on Ansev suggestion in the comment under his answer, I get this -

id        Qty        DayType
B07       20817       43
B07        3607       44
C03       22289       43
C03        6569       44
F05       22807       43

As you can see it is adding up DayType too. I expect 0 and 1 there to represent weekday and weekend. Any suggestion?

ansev · Accepted Answer · 2020-01-15T19:27:11.813

0

The ideal is to use DataFrame.groupby. To group differentiating the type of the day of the week, we must check when df['DayType'] is greater or equal than 5, we should also differentiate for each Id. Since df['Day'Type'].ge(5) returns False or True, we can use np.where or Series.replace / Series.map to replace True and False with weekdays and weekend

aggregate_df = (df.groupby(['Id',df['DayType'].ge(5).astype(int)])['Qty']
                  .sum().reset_index().reindex(columns = ['Id','Qty','DayType']))
print(aggregate_df)

    Id   Qty  DayType
0  B07  4858        0
1  B07  1677        1
2  B08  3741        0

Finally we use DataFrame.pivot_table

final_result = (aggregate_df.assign(DayType = aggregate_df['DayType'].map({0:'Weekdays', 
                                                                           1:'Weekend'}))
                            .pivot_table(columns='DayType',values = 'Qty',index='Id',fill_value=0)
                            .reset_index()
                            .rename_axis(columns=None))
print(final_result)

    Id  Weekdays  Weekend
0  B07      4858     1677
1  B08      3741        0

edited Jan 15 '20 at 19:27

answered Jan 15 '20 at 17:22

ansev

30,322
5
17
31

Thank you. I liked your solution but is there any way to modify your code to output the non-pivot format which I have mentioned in my post? – San Jan 15 '20 at 18:11
I think you want `aggregate_df.reset_index(level = [0,'Id'],drop=0)`. Tell me if you could consider accept my answer – ansev Jan 15 '20 at 18:15
I have accepted your answer. However, what you have suggested in comment is not working out. It is aggregating 'daytype' column values as well. How do I keep them 0 and 1 for weekday and weekend resp.? – San Jan 15 '20 at 18:26
I think `aggregate_df.reset_index(level = [0,'Id'],drop=0)` or `aggregate_df.reset_index(level = 'Id').reset_index(drop=True)` should work well. If this second method does not work, you could edit the question with the data frame you will get with this code and I can suggest a new solution. Thanks for accepting:) – ansev Jan 15 '20 at 18:32
I have edited the post. The given suggestion is not giving me the desired output. Please have a look. – San Jan 15 '20 at 18:46
I'm sorry it was clear what you meant, I edited the answer – ansev Jan 15 '20 at 19:04
The updated solution worked. Thanks Man. How do you people know so much? I must die. – San Jan 15 '20 at 20:08

score 0 · Answer 2 · answered Jan 15 '20 at 17:24

0

df['DayType'] = df['DayType'].apply(lambda x: 0 if x<5 else 1)
df = df.groupby(['Id','DayType'],as_index=False)['Qty'].sum()
df['DayType'] = df['DayType'].apply(lambda x: 'Weekday' if x==0 else 'Weekend')
df.pivot(index='Id',columns='DayType', values='Qty')

Output

    DayType Weekday Weekend
Id      
B07         4858.0  1677.0
B08         3741.0  NaN

answered Jan 15 '20 at 17:24

Chris

15,819
3
24
37

2

https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code – ansev Jan 15 '20 at 17:25
The suggestion by ansev is more elegant and better performing by far, just showing how you might navigate through this problem with a more basic understanding of pandas – Chris Jan 15 '20 at 17:26
1

Try using ```np.where(...)``` instead of ```apply(...)``` won't change much in terms of premise of the basic-pandas-solution ;) – Grzegorz Skibinski Jan 15 '20 at 17:33

How do I groupby with one key column, using condition on 2nd column and adding third column separately for weekday and weekend?

2 Answers2