Calculate the percentage of each distinct value

Question

I have a data frame as given below:

 cust_id            state           city     buy_times

 1. 123             delhi           xyz           2

 2. 234             haryana         ggm           4
 3. 345             delhi           abc           7
 4. 456             AP              asd           3

So on and so forth.

I need to calculate the percentage of sales by each city and state.

df = data.groupby(['state','city'])['buy_times].sum()

Using this command, I have got the total number of sales by each city and state. But I am unable to calculate the percentage of the same.

Possible duplicate of [Pandas percentage of total with groupby](https://stackoverflow.com/questions/23377108/pandas-percentage-of-total-with-groupby) — kvk30, Dec 13 '17 at 07:03

jezrael · Accepted Answer · 2017-12-13T07:02:13.197

I think you need transform if need new columns of percentage to original df:

print (data)
   cust_id    state city  buy_times
0      123    delhi  xyz          2
1      345    delhi  abc          7
2      234  haryana  ggm          4
3      345    delhi  xyz          7
4      456       AP  asd          3

sum1 = data.groupby(['state','city'])['buy_times'].transform('sum')
data['new'] = data['buy_times'].div(sum1)
print (data)
   cust_id    state city  buy_times       new
0      123    delhi  xyz          2  0.222222
1      345    delhi  abc          7  1.000000
2      234  haryana  ggm          4  1.000000
3      345    delhi  xyz          7  0.777778
4      456       AP  asd          3  1.000000

If need percentage of aggregate values:

df1 = data.groupby(['state','city'])['buy_times'].sum()
df = df1.div(df1.groupby('state').transform('sum')).reset_index(name='perc')
print (df)
     state city    perc
0       AP  asd  1.0000
1    delhi  abc  0.4375
2    delhi  xyz  0.5625
3  haryana  ggm  1.0000

Calculate the percentage of each distinct value

1 Answers1