0

Easier to explain with an example, say I have an example dataframe here with year, cc_rating and number_x.

df = pd.DataFrame({"year":{"0":2005,"1":2005,"2":2005,"3":2006,"4":2006,"5":2006,"6":2007,"7":2007,"8":2007},"cc_rating":{"0":"2","1":"2a","2":"2b","3":"2","4":"2a","5":"2b","6":"2","7":"2a","8":"2b"},"number_x":{"0":9368,"1":21643,"2":107577,"3":10069,"4":21486,"5":110326,"6":10834,"7":21566,"8":111082}})

df 

year    cc_rating   number_x
0   2005    2   9368
1   2005    2a  21643
2   2005    2b  107577
3   2006    2   10069
4   2006    2a  21486
5   2006    2b  110326
6   2007    2   10834
7   2007    2a  21566
8   2007    2b  111082

Problem

How can I get the % of number_x per year? Meaning:

enter image description here

Straight division wont work as year cant be set as the index in the original df as it is not unique.

Right now I'm doing the following but its quite inefficient and im sure theres a better way.

df= pd.merge(df, df.groupby('year').sum(), left_on='year',right_index=True)
df['%'] = round((df['number_x'] / df['number_y'])*100 , 2)
df = df.drop('number_y', axis=1)

Thanks!

Wboy
  • 2,452
  • 2
  • 24
  • 45

1 Answers1

0

A possible solution:

(df.assign(
    perc = (100*df.number_x.div(df.groupby('year').number_x.transform('sum')))
    .round(2)))

Output:

   year cc_rating  number_x   perc
0  2005         2      9368   6.76
1  2005        2a     21643  15.62
2  2005        2b    107577  77.62
3  2006         2     10069   7.10
4  2006        2a     21486  15.14
5  2006        2b    110326  77.76
6  2007         2     10834   7.55
7  2007        2a     21566  15.03
8  2007        2b    111082  77.42
PaulS
  • 21,159
  • 2
  • 9
  • 26