1

I want to summarize a column according to each row's fields, and append that result to each row as a field (column). For example, I want to count how many values of a column equal to each row's value of that column('condition' column is just for simplification, it's actually a value calculated by using each row's fields.):

Original dataframe:

    condition
0   True
1   False
2   True
3   True
4   False
5   True
6   True

result:

    condition   Count
0   True        5.0
1   False       2.0
2   True        5.0
3   True        5.0
4   False       2.0
5   True        5.0
6   True        5.0

I can only think of using iterrows to do this in an iterative way, although this is not a classic pandas way:

result = pd.DataFrame(index=df.index)

for i,r in df.iterrows():
    df2=df.loc[(df['condition']==r['condition'])]
    result.loc[i,'condition']=r['condition']
                                              
    result.loc[i,'Count']=df2.shape[0]  # How many items are the same as that row's 'condition' field  

Is there any typical vectorization method to do this? Thank you for any help you can provide.

2 Answers2

0

You can do it with value_counts and map methods:

tally = df['condition'].value_counts()
df['Count'] = df['condition'].map(tally)
print(df)

Output:

   condition  Count
0       True      5
1      False      2
2       True      5
3       True      5
4      False      2
5       True      5
6       True      5
Bill
  • 10,323
  • 10
  • 62
  • 85
0

You can use the groupby method in pandas to summarize a column according to each row's fields. Here is an example:

df = pd.DataFrame({'condition': [True, False, True, True, False, True, True]})
df['Count'] = df.groupby('condition')['condition'].transform('count')

This will give you the desired result:

   condition  Count
0      True      5
1     False      2
2      True      5
3      True      5
4     False      2
5      True      5
6      True      5