Replicating case in numerator and denominator of weighted average calculation in pandas

Question

Pandas newbie trying to replicate sql to python. Referencing the below post, I could use a simple function to calculate the weighted average of a column in a pandas dataframe.

Calculate weighted average using a pandas/dataframe

  Date        ID      wt      value   
  01/01/2012  100     0.50    60      
  01/01/2012  101     0.75    
  01/01/2012  102     1.00    100
  01/02/2012  201     0.50    
  01/02/2012  202     1.00    80

However, if I had conditions in both numerator and denominator and to get an aggregate of the weighted average, I would do the below in sql:

 SELECT
    date
    , id
    , SUM(CASE WHEN value IS NOT NULL THEN value * wt ELSE 0 END) /
        NULLIF(SUM(CASE WHEN value > 0 THEN wt ELSE 0 END), 0)
    AS wt_avg

 FROM table
 GROUP BY date, id

How would we replicate this in Pandas?

Thanks in advance.

Are you sure about this formula? You do not want zero in the denominator. — Parfait, Jun 21 '18 at 20:36

score 1 · Accepted Answer · answered Jun 21 '18 at 21:03

Consider using calculated, helper columns according to specified logic with np.where() replacing the CASE statements and Series.fillna() as counterpart to NULLIF.

df['numer'] = np.where(pd.notnull(df['value']), df['value'] * df['wt'], 0)
df['denom'] = pd.Series(np.where(df['value'] > 0, df['wt'], 0)).fillna(0)

df['wt_avg'] = (df.groupby(['Date', 'ID'])['numer'].transform(sum) /  
                    df.groupby(['Date', 'ID'])['denom'].transform(sum))

print(df)
# print(df.drop(columns=['numer', 'denom']))      # DROP HELPER COLUMNS

#          Date   ID    wt  value  numer  denom  wt_avg
# 0  01/01/2012  100  0.50   60.0   30.0    0.5    60.0
# 1  01/01/2012  101  0.75    NaN    0.0    0.0     NaN
# 2  01/01/2012  102  1.00  100.0  100.0    1.0   100.0
# 3  01/02/2012  201  0.50    NaN    0.0    0.0     NaN
# 4  01/02/2012  202  1.00   80.0   80.0    1.0    80.0

Thank you @Parfait - I will try the above. So, if we have multiple columns, we would just extend the same right? So, it would be wt_avg1, wt_avg2 and so on.. — Lucky, Jun 21 '18 at 21:44

Replicating case in numerator and denominator of weighted average calculation in pandas

1 Answers1