0

I have a dataframe with sales orders and I want to get the count of the orderlines per order in every row:

Order   Orderline   
1       0             
1       1             
1       2             
2       0             
3       0             
3       1             

What I would like to obtain is

Order   Orderline   Count   
1       0           3  
1       1           3 
1       2           3
2       0           1  
3       0           2  
3       1           2  

I tried using transform('count') as I noticed it being used in How to add a new column and fill it up with a specific value depending on another column's series? but that didn't work out. It flattened down my table instead.

Any ideas on how to accomplish this?

Sal_H
  • 75
  • 1
  • 11
  • 1
    `df['Count'] = df.groupby("Order")['Orderline'].transform('count')` works for me – anky Jun 24 '20 at 17:14
  • Thank you @anky, that worked out for me. Please make it an answer so I can mark it as such. – Sal_H Jun 24 '20 at 17:29

2 Answers2

0

Just groupby and then transform:

df['Count'] = df.groupby(by=['Order'], as_index=False).transform('count')
print(df)

   Order  Orderline  Count
0      1          0      3
1      1          1      3
2      1          2      3
3      2          0      1
4      3          0      2
5      3          1      2
NYC Coder
  • 7,424
  • 2
  • 11
  • 24
  • Unfortunately I can't upvote your answer as I don't have enough reputation yet. Thank you for the effort. – Sal_H Jun 24 '20 at 17:30
0

Try this

df['counts'] = df.apply(lambda x: (df['Order'] == x['Order']).sum(), axis=1)
dberezin
  • 46
  • 5
  • Thank you for the answer. It will work I'm sure. I read that apply should be avoided and transform is way more performant. I'm having a dataset with millions of rows so performance is also important to me. – Sal_H Jun 24 '20 at 17:31