2

I have a dataframe looking like this, df1:

col1   col2
 A      2
 A      3
 A      4
 B      4
 B      8

Now, I want to calculate the percentage of the value in col2 per unique item in col1. Hence I want the result to be:

col1   col2
 A      0.22
 A      0.33
 A      0.33
 B      0.33
 B      0.67

Hence, the sum of col2 has to be 1 for the unique elements in col1. Does anyone know how to do this without using for loops?

baqm
  • 121
  • 6

2 Answers2

3

Use GroupBy.transform for sums to Series and divide by original column col2:

df['col2'] /= df.groupby('col1')['col2'].transform('sum')
#working like
#df['col2'] = df['col2'] / df.groupby('col1')['col2'].transform('sum')
print (df)
  col1      col2
0    A  0.222222
1    A  0.333333
2    A  0.444444
3    B  0.333333
4    B  0.666667
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Another way, but limiting, since it sets an index (you would need to reset the index) and possibly not as efficient as using the transform :

df = df.set_index('col1')

df.div(df.sum(level=0)).reset_index()

  col1      col2
0    A  0.222222
1    A  0.333333
2    A  0.444444
3    B  0.333333
4    B  0.666667
sammywemmy
  • 27,093
  • 4
  • 17
  • 31