0
Country  life_expectancy   population 

Germany     70               3000000
France      75               450000
USA         70               350000
India       65               4000000
Pakistan    60               560000
Belgium     68               230000

I want to calculate the weighted average life expectancy according to the formula below:

∑ ( × )/ ∑   

where  = life expectancy
       = population

NOTE: The weighted average life expectancy is computed with the sum of the products of life expectancy by the total population of each country divided by the sum of the total population of each country

Can anyone please tell me how to solve this using for loop?

SKPS
  • 5,433
  • 5
  • 29
  • 63
peeps
  • 43
  • 7
  • 1
    The answer is here: https://stackoverflow.com/questions/26205922/calculate-weighted-average-using-a-pandas-dataframe and here: https://stackoverflow.com/questions/33657809/calculate-weighted-average-with-pandas-dataframe – SKPS Feb 07 '20 at 22:35
  • I want to implement it using for loop and my formula is also different – peeps Feb 07 '20 at 22:42
  • 2
    Any specific reasons why you would like to use the `for` loop, when it can be done without it and much faster? – SKPS Feb 07 '20 at 22:46

3 Answers3

1

Using numpy.average(..., weights=...):

Ref: https://docs.scipy.org/doc/numpy/reference/generated/numpy.average.html

import numpy as np

res=np.average(df["life_expectancy"], weights=df["population"])

Outputs:

67.22817229336438
Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
0

with a for loop

numerator, denominator = 0, 0
for i in df.index:
    numerator += df.loc[i, 'life_expectancy'] * df.loc[i, 'population']
    denominator += df.loc[i, 'population']
weighted_average = numerator / denominator

or using pandas to do everything faster and in any easier to read way (this is my recommended solution)

weighted_average = (df['life_expectancy']*df['population']).sum() / df['population'].sum()
Quinn
  • 91
  • 1
  • 5
  • thanks for the solution. I have one more question if I add another column continents then how can I groupby it on continents and calculate the weighted avg? – peeps Feb 08 '20 at 00:37
  • i tried doing this df.groupby('continents').apply(lambda x : (df['life_expectancy']*df['population']).sum() / df['population'].sum()) but its giving me the same value for all the continents – peeps Feb 08 '20 at 00:41
  • your lambda function is wrong, it should be df.groupby('continents').apply(lambda x : (x['life_expectancy']*x['population']).sum() / x['population'].sum()) – Quinn Feb 10 '20 at 00:17
  • @peeps, you should accept the best answer once you have seen it – Quinn Feb 11 '20 at 03:52
0

Actually for loop is not required here you can directly calculate

life_exp = (countries_df.life_expectancy*countries_df.population).sum()/countries_df.population.sum()
jottbe
  • 4,228
  • 1
  • 15
  • 31