Easier to explain with an example, say I have an example dataframe here with year
, cc_rating
and number_x
.
df = pd.DataFrame({"year":{"0":2005,"1":2005,"2":2005,"3":2006,"4":2006,"5":2006,"6":2007,"7":2007,"8":2007},"cc_rating":{"0":"2","1":"2a","2":"2b","3":"2","4":"2a","5":"2b","6":"2","7":"2a","8":"2b"},"number_x":{"0":9368,"1":21643,"2":107577,"3":10069,"4":21486,"5":110326,"6":10834,"7":21566,"8":111082}})
df
year cc_rating number_x
0 2005 2 9368
1 2005 2a 21643
2 2005 2b 107577
3 2006 2 10069
4 2006 2a 21486
5 2006 2b 110326
6 2007 2 10834
7 2007 2a 21566
8 2007 2b 111082
Problem
How can I get the % of number_x per year? Meaning:
Straight division wont work as year cant be set as the index in the original df as it is not unique.
Right now I'm doing the following but its quite inefficient and im sure theres a better way.
df= pd.merge(df, df.groupby('year').sum(), left_on='year',right_index=True)
df['%'] = round((df['number_x'] / df['number_y'])*100 , 2)
df = df.drop('number_y', axis=1)
Thanks!