0

I am calculating the weighted mean on the following data. I get a positional indexers are out of bounds error.

dfpa:

PA  pa_pop  pa_mean_ea_kwh
A   30      42431.52608
B   1596    177765.6662
C   193     284501.2339
D   0   
E   84      316868.3264

this is my code:

wmea = lambda x: np.ma.average(x, weights=(dfpa.loc[x.index, 'pa_mean_ea_kwh'] * dfpa.loc[x.index, 'pa_pop']))
dfsw = dfpa.agg(
    sw_pop = ('pa_pop', 'sum'),
    sw_mean_ea_kwh = ('pa_mean_ea_kwh', wmea))

In the past I've successfully used similar code, but my aggregation is coupled with a groupby. So it would be

dfsw  = dfpa.groupby('PA').agg(
    sw_pop = ('pa_pop', 'sum'),
    sw_mean_ea_kwh = ('pa_mean_ea_kwh', wmea))

But in this case I just need a single output for all the data. Is the np.ma.average lambda function incorrect to achieve this? Why won't it just do a sumproduct and then divide by the sum?

adding desired output:

sw_pop  sw_mean_ea_kwh
1903    192597.2814

where sw_mean_ea_kwh is calculated as SUMPRODUCT(pa_pop,pa_mean_ea_kwh)/SUM(pa_pop)

ps: there's an alternative solution here to compute weighted average, but it again uses groupby

SModi
  • 125
  • 14

2 Answers2

1

This works similar to the sumproduct and dividing by the sum approach:

(dfpa['pa_pop']*dfpa['pa_mean_ea_kwh']).sum()/(dfpa['pa_pop'].sum())
jlb_gouveia
  • 603
  • 3
  • 11
0

This is what I finally did. I am not proud of it - it's not elegant and I couldn't compute the weighted average within the aggregate function as I originally wanted. But it works with the solution provided by @jlb_gouveia:

dfsw = dfpa.agg({'pa_pop':sum}, axis = 0)

#convert list to dataframe because the above creates a list
dfsw = dfsw.to_frame() 

#transpose so that index becomes column headers
dfsw = dfsw.transpose() 

#renames column headers to new names
dfsw.columns = dfsw.columns.str.replace('pa','sw') 

#add weighted means
dfsw['sw_mean_ea_kwh'] = (dfpa['pa_pop']*dfpa['pa_mean_ea_kwh']).sum()/(dfpa['pa_pop'].sum())

this is what my final output looks like:

    sw_pop  sw_n    sw_ncmplt...sw_mean_ea_kwh...   sw_mean_ep_kwh  
0   1903.0  140.0   140.0   ....192597.28147843637  206253.99375475512

(I'm doing many sums and weighted means in the above code, I've removed code for the additional columns to keep things simple)

If someone has a more elegant way to achieve this would like to improve.

SModi
  • 125
  • 14