How to add a calculated column that gives aggregation of rows but maintains the shape of the dataframe in Pandas

Question

I have a dataframe with sales orders and I want to get the count of the orderlines per order in every row:

Order   Orderline   
1       0             
1       1             
1       2             
2       0             
3       0             
3       1

What I would like to obtain is

Order   Orderline   Count   
1       0           3  
1       1           3 
1       2           3
2       0           1  
3       0           2  
3       1           2

I tried using transform('count') as I noticed it being used in How to add a new column and fill it up with a specific value depending on another column's series? but that didn't work out. It flattened down my table instead.

Any ideas on how to accomplish this?

`df['Count'] = df.groupby("Order")['Orderline'].transform('count')` works for me — anky, Jun 24 '20 at 17:14
Thank you @anky, that worked out for me. Please make it an answer so I can mark it as such. — Sal_H, Jun 24 '20 at 17:29

score 0 · Accepted Answer · answered Jun 24 '20 at 17:19

0

Just groupby and then transform:

df['Count'] = df.groupby(by=['Order'], as_index=False).transform('count')
print(df)

   Order  Orderline  Count
0      1          0      3
1      1          1      3
2      1          2      3
3      2          0      1
4      3          0      2
5      3          1      2

answered Jun 24 '20 at 17:19

NYC Coder

7,424
2
11
24

Unfortunately I can't upvote your answer as I don't have enough reputation yet. Thank you for the effort. – Sal_H Jun 24 '20 at 17:30

score 0 · Answer 2 · answered Jun 24 '20 at 17:20

0

Try this

df['counts'] = df.apply(lambda x: (df['Order'] == x['Order']).sum(), axis=1)

answered Jun 24 '20 at 17:20

dberezin

46
5

Thank you for the answer. It will work I'm sure. I read that apply should be avoided and transform is way more performant. I'm having a dataset with millions of rows so performance is also important to me. – Sal_H Jun 24 '20 at 17:31

How to add a calculated column that gives aggregation of rows but maintains the shape of the dataframe in Pandas

2 Answers2