I want to summarize a column according to each row's fields, and append that result to each row as a field (column). For example, I want to count how many values of a column equal to each row's value of that column('condition' column is just for simplification, it's actually a value calculated by using each row's fields.):
Original dataframe:
condition
0 True
1 False
2 True
3 True
4 False
5 True
6 True
result:
condition Count
0 True 5.0
1 False 2.0
2 True 5.0
3 True 5.0
4 False 2.0
5 True 5.0
6 True 5.0
I can only think of using iterrows
to do this in an iterative way, although this is not a classic pandas way:
result = pd.DataFrame(index=df.index)
for i,r in df.iterrows():
df2=df.loc[(df['condition']==r['condition'])]
result.loc[i,'condition']=r['condition']
result.loc[i,'Count']=df2.shape[0] # How many items are the same as that row's 'condition' field
Is there any typical vectorization method to do this? Thank you for any help you can provide.