How to compute weighted average

Question

Country  life_expectancy   population 

Germany     70               3000000
France      75               450000
USA         70               350000
India       65               4000000
Pakistan    60               560000
Belgium     68               230000

I want to calculate the weighted average life expectancy according to the formula below:

∑ ( × )/ ∑   

where  = life expectancy
       = population

NOTE: The weighted average life expectancy is computed with the sum of the products of life expectancy by the total population of each country divided by the sum of the total population of each country

Can anyone please tell me how to solve this using for loop?

The answer is here: https://stackoverflow.com/questions/26205922/calculate-weighted-average-using-a-pandas-dataframe and here: https://stackoverflow.com/questions/33657809/calculate-weighted-average-with-pandas-dataframe — SKPS, Feb 07 '20 at 22:35
I want to implement it using for loop and my formula is also different — peeps, Feb 07 '20 at 22:42
Any specific reasons why you would like to use the `for` loop, when it can be done without it and much faster? — SKPS, Feb 07 '20 at 22:46

score 1 · Answer 1 · answered Feb 07 '20 at 22:53

1

Using numpy.average(..., weights=...):

Ref: https://docs.scipy.org/doc/numpy/reference/generated/numpy.average.html

import numpy as np

res=np.average(df["life_expectancy"], weights=df["population"])

Outputs:

67.22817229336438

answered Feb 07 '20 at 22:53

Grzegorz Skibinski

12,624
2
11
34

score 0 · Answer 2 · answered Feb 07 '20 at 22:53

0

with a for loop

numerator, denominator = 0, 0
for i in df.index:
    numerator += df.loc[i, 'life_expectancy'] * df.loc[i, 'population']
    denominator += df.loc[i, 'population']
weighted_average = numerator / denominator

or using pandas to do everything faster and in any easier to read way (this is my recommended solution)

weighted_average = (df['life_expectancy']*df['population']).sum() / df['population'].sum()

answered Feb 07 '20 at 22:53

Quinn

91
1
5

thanks for the solution. I have one more question if I add another column continents then how can I groupby it on continents and calculate the weighted avg? – peeps Feb 08 '20 at 00:37
i tried doing this df.groupby('continents').apply(lambda x : (df['life_expectancy']*df['population']).sum() / df['population'].sum()) but its giving me the same value for all the continents – peeps Feb 08 '20 at 00:41
your lambda function is wrong, it should be df.groupby('continents').apply(lambda x : (x['life_expectancy']*x['population']).sum() / x['population'].sum()) – Quinn Feb 10 '20 at 00:17
@peeps, you should accept the best answer once you have seen it – Quinn Feb 11 '20 at 03:52

score 0 · Answer 3 · edited Oct 03 '20 at 22:50

0

Actually for loop is not required here you can directly calculate

life_exp = (countries_df.life_expectancy*countries_df.population).sum()/countries_df.population.sum()

edited Oct 03 '20 at 22:50

jottbe

4,228
1
15
31

answered Oct 03 '20 at 22:04

siddharth

1

How to compute weighted average

3 Answers3

Linked