Python dataframe group by column and create new column with percentage

Question

I have a scenario simulating to a dataframe which looks something like below:

    Month  Amount  
1   Jan     260
2   Feb    179
3   Mar    153
4   Apr    142
5   May    128
6   Jun    116
7   Jul    71
8   Aug    56
9   Sep    49
10  Oct    17
11  Nov    0
12  Dec    0

I'm trying to get new column by calculating percentage for each row using dataframe group by and use lambda function as below:

 df = pd.DataFrame(mylistofdict)
 df = df.groupby('Month')["Amount"].apply(lambda x: x / x.sum()*100)

But I'm not getting the expected result below only 2 columns:

    Month   Percentage
1   Jan         22%
2   Feb         15%
3   Mar         13%
4   Apr         12%
5   May         11%
6   Jun         10%
7   Jul         6%
8   Aug         5%
9   Sep         4%
10  Oct         1%
11  Nov         0
12  Dec         0

How do i modify my code or is there anything better than use dataframe.

@jezrael depends might be might not be depends on some previous calculations. At times there might be 2 months with same count. — Rachel, Jan 28 '18 at 17:55
@jezrael had another query . I'm using pd.options.display.float_format = '${:,.2f}'.format to get only 2 decimal values. But once i convert the dataframe to dictionary as df.T.to_dict().values() again all the decimal values are taken in dictionary. How can i prevent this — Rachel, Jan 28 '18 at 18:48
@Rachel that is another question, so the thing to do at Stack Overflow is to open another question not ask it in the comments. Also, this is not a duplicate of the other question not sure why this was marked as such. — eric, Dec 27 '19 at 06:19

jezrael · Accepted Answer · 2018-01-28T17:41:01.120

If values of Month are unique use:

df['perc'] = df["Amount"] / df["Amount"].sum() * 100
print (df)
   Month  Amount       perc
1    Jan     260  22.203245
2    Feb     179  15.286080
3    Mar     153  13.065756
4    Apr     142  12.126388
5    May     128  10.930828
6    Jun     116   9.906063
7    Jul      71   6.063194
8    Aug      56   4.782237
9    Sep      49   4.184458
10   Oct      17   1.451751
11   Nov       0   0.000000
12   Dec       0   0.000000

If values of Month are duplicated I believe is possible use:

print (df)
   Month  Amount
1    Jan     260
1    Jan     100
3    Mar     153
4    Apr     142
5    May     128
6    Jun     116
7    Jul      71
8    Aug      56
9    Sep      49
10   Oct      17
11   Nov       0
12   Dec       0


df = df.groupby('Month', as_index=False, sort=False)["Amount"].sum()
df['perc'] = df["Amount"] / df["Amount"].sum() * 100
print (df)
   Month  Amount       perc
0    Jan     360  32.967033
1    Mar     153  14.010989
2    Apr     142  13.003663
3    May     128  11.721612
4    Jun     116  10.622711
5    Jul      71   6.501832
6    Aug      56   5.128205
7    Sep      49   4.487179
8    Oct      17   1.556777
9    Nov       0   0.000000
10   Dec       0   0.000000

had another query . I'm using pd.options.display.float_format = '${:,.2f}'.format to get only 2 decimal values. But once i convert the dataframe to dictionary as df.T.to_dict().values() again all the decimal values are taken in dictionary. How can i prevent this — Rachel, Jan 28 '18 at 18:47

Python dataframe group by column and create new column with percentage

1 Answers1