0

I have a dataframe:

             Type Weights Value  ....
    0         W     0.5    15  
    1         C     1.2    19  
    2         W     12     25  
    3         C     7.1    15    .....
    .......
    .......

I want to group on type and then calculate weighted mean and weighted standard deviation.

There seem to be solution available for weighted mean (groupby weighted average and sum in pandas dataframe) but none for weighted standard deviation.

Is there a simple way to do it.

PKL
  • 55
  • 5

2 Answers2

1

I have used the weighted standard deviation formula from the following link: https://doc-archives.microstrategy.com/producthelp/10.7/FunctionsRef/Content/FuncRef/WeightedStDev__weighted_standard_deviation_of_a_sa.htm

However you can modify for a different formula

import numpy as np
def weighted_sd(input_df):
    weights = input_df['Weights']
    vals = input_df['Value']
    numer = np.sum(weights * (vals - vals.mean())**2)
    denom = ((vals.count()-1)/vals.count())*np.sum(weights)
    return np.sqrt(numer/denom)

print(df.groupby('Type').apply(weighted_sd))
Muhammad Rasel
  • 704
  • 4
  • 9
0

Minor correction to the weighted standard deviation formula from the previous answer.

import numpy as np
def weighted_sd(input_df):
    weights = input_df['Weights']
    vals = input_df['Value']

    weighted_avg = np.average(vals, weights=weights)
    
    numer = np.sum(weights * (vals - weighted_avg)**2)
    denom = ((vals.count()-1)/vals.count())*np.sum(weights)
    
    return np.sqrt(numer/denom)

print(df.groupby('Type').apply(weighted_sd))
Hellen
  • 11
  • 1