2

I am just trying to calculate the percentage of one column against another's total, but I am unsure how to do this in Pandas so the calculation gets added into a new column.

Let's say, for argument's sake, my data frame has two attributes:

  • Number of Green Marbles
  • Total Number of Marbles

Now, how would I calculate the percentage of the Number of Green Marbles out of the Total Number of Marbles in Pandas?

Obviously, I know that the calculation will be something like this:

  • (Number of Green Marbles / Total Number of Marbles) * 100

Thanks - any help is much appreciated!

3 Answers3

2

By default, arithmetic operations on pandas dataframes are element-wise, so this is as simple as it can be:

import pandas as pd

>>> d = pd.DataFrame()
>>> d['green'] = [3,5,10,12]
>>> d['total'] = [8,8,20,20]
>>> d
   green  total
0      3      8
1      5      8
2     10     20
3     12     20
>>> d['percent_green'] = d['green'] / d['total'] * 100
>>> d
   green  total  percent_green
0      3      8           37.5
1      5      8           62.5
2     10     20           50.0
3     12     20           60.0

References:

Stef
  • 13,242
  • 2
  • 17
  • 28
  • 1
    Thank you very much - that is simpler than I thought. I did have something like that in mind, but I was unsure. Have a nice day. –  Dec 06 '20 at 15:54
0

df['percentage columns'] = (df['Number of Green Marbles']) / (df['Total Number of Marbles'] ) * 100

Janneman
  • 343
  • 3
  • 13
0

Here is my comparison of regular vs vectorized approach:

%timeit us_consum['Commercial_%ofUS'] = us_consum['Commercial_MWhrs']*100/us_consum['Total US consumption (MWhr)']

351 µs ± 22.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


%timeit us_consum['Commercial_%ofUS'] = (us_consum['Commercial_MWhrs'].div(us_consum['Total US consumption (MWhr)']))*100 
337 µs ± 60.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Mainland
  • 4,110
  • 3
  • 25
  • 56