1

My dataframe has 3 columns. ID, Day Type (Monday = 0, Sunday = 6), and quantity.

I want to groupby in a way that for each 'Id' I would have 2 rows, one with aggregate quantity of weekdays (M-F) and one for weekend.

Id  Qty  DayType
B07 581  0
B07 1096 1
B07 1245 2
B07 1080 3
B07 856  4
B07 581  5
B07 1096 6
B08 1345 0
B08 1580 3
B08 816  4

Expected output

Id  Qty  DayType
B07 4858 0
B07 1677 1
B08 3741 0

The 0 in the DayType in the output indicates weekday and 1 indicates weekend.

Also, if you can suggest how to pivot it later, that will be even much helpful

id weekday weekend
B07 4858    1677
B08 3741    0

I found a few solutions to groupby based on a condition but in this condition there are two conditions which I cannot figure out how to implement. Thank you.


Based on the answer suggested by Ansev, I am getting following output -

Id  Weekdays  Weekend
B07     20817     3607
C03     22289     6569
F05     22807     6474

That is correct.

But when I try non-pivot format based on Ansev suggestion in the comment under his answer, I get this -

id        Qty        DayType
B07       20817       43
B07        3607       44
C03       22289       43
C03        6569       44
F05       22807       43

As you can see it is adding up DayType too. I expect 0 and 1 there to represent weekday and weekend. Any suggestion?

San
  • 13
  • 3

2 Answers2

0

The ideal is to use DataFrame.groupby. To group differentiating the type of the day of the week, we must check when df['DayType'] is greater or equal than 5, we should also differentiate for each Id. Since df['Day'Type'].ge(5) returns False or True, we can use np.where or Series.replace / Series.map to replace True and False with weekdays and weekend

aggregate_df = (df.groupby(['Id',df['DayType'].ge(5).astype(int)])['Qty']
                  .sum().reset_index().reindex(columns = ['Id','Qty','DayType']))
print(aggregate_df)

    Id   Qty  DayType
0  B07  4858        0
1  B07  1677        1
2  B08  3741        0

Finally we use DataFrame.pivot_table

final_result = (aggregate_df.assign(DayType = aggregate_df['DayType'].map({0:'Weekdays', 
                                                                           1:'Weekend'}))
                            .pivot_table(columns='DayType',values = 'Qty',index='Id',fill_value=0)
                            .reset_index()
                            .rename_axis(columns=None))
print(final_result)

    Id  Weekdays  Weekend
0  B07      4858     1677
1  B08      3741        0
ansev
  • 30,322
  • 5
  • 17
  • 31
  • Thank you. I liked your solution but is there any way to modify your code to output the non-pivot format which I have mentioned in my post? – San Jan 15 '20 at 18:11
  • I think you want `aggregate_df.reset_index(level = [0,'Id'],drop=0)`. Tell me if you could consider accept my answer – ansev Jan 15 '20 at 18:15
  • I have accepted your answer. However, what you have suggested in comment is not working out. It is aggregating 'daytype' column values as well. How do I keep them 0 and 1 for weekday and weekend resp.? – San Jan 15 '20 at 18:26
  • I think `aggregate_df.reset_index(level = [0,'Id'],drop=0)` or `aggregate_df.reset_index(level = 'Id').reset_index(drop=True)` should work well. If this second method does not work, you could edit the question with the data frame you will get with this code and I can suggest a new solution. Thanks for accepting:) – ansev Jan 15 '20 at 18:32
  • I have edited the post. The given suggestion is not giving me the desired output. Please have a look. – San Jan 15 '20 at 18:46
  • I'm sorry it was clear what you meant, I edited the answer – ansev Jan 15 '20 at 19:04
  • The updated solution worked. Thanks Man. How do you people know so much? I must die. – San Jan 15 '20 at 20:08
0
df['DayType'] = df['DayType'].apply(lambda x: 0 if x<5 else 1)
df = df.groupby(['Id','DayType'],as_index=False)['Qty'].sum()
df['DayType'] = df['DayType'].apply(lambda x: 'Weekday' if x==0 else 'Weekend')
df.pivot(index='Id',columns='DayType', values='Qty')

Output

    DayType Weekday Weekend
Id      
B07         4858.0  1677.0
B08         3741.0  NaN
Chris
  • 15,819
  • 3
  • 24
  • 37
  • 2
    https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code – ansev Jan 15 '20 at 17:25
  • The suggestion by ansev is more elegant and better performing by far, just showing how you might navigate through this problem with a more basic understanding of pandas – Chris Jan 15 '20 at 17:26
  • 1
    Try using ```np.where(...)``` instead of ```apply(...)``` won't change much in terms of premise of the basic-pandas-solution ;) – Grzegorz Skibinski Jan 15 '20 at 17:33