3

I want to fit some data points to a normal distribution, but I can't find a function that lets me put in the weights of the data points. scipy.stats.norm.fit only wants some data and if necessary the loc and scale parameters are used for average and standard deviation.

The weights of my data are floating points, so I can't use the solution described in Fit normal distribution to weighted list for obvious reasons.

values = [0, 1, 2, 3, ..., 44, 52]
weights = [0.06537925227866273, 0.9735569357920033, 3.1333312174908325, 5.558819116316957, ..., 0.0070813375592937555, 0.040237487324237445]

For me it's not a good solution to multiple the weights by 100 and then use round(), because the weights can get smaller than that.

  • Do you want to "fit some data points to a normal distribution", or fit parameters of a normal distribution to data points? Have you tried [scipy.optimize.minimize](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html)? – rpoleski Apr 26 '20 at 08:23
  • Indeed, I have data points, and I want to fit a normal distribution to it. I haven't tried scipy.optimize.minimize, I'm not sure how to use it: Do I have to implement a function that uses Chi-square, 2 parameters (avg and stdev) and my values and weights as fixed arguments? – Jasper Derbaix Apr 26 '20 at 15:09
  • Your function should have 2 fitted parameters (mean & sigma). Your `values` and `weights` must also be passed. The function calculates probability for `values` and returns a diagnostic how well these match the normalized `weights`. This diagnostic can be chi2. – rpoleski Apr 26 '20 at 17:47

1 Answers1

1

You can fit the weighted data with a normal distribution by taking weighted averages over the data and the squared errors:

def fit_normal(values, weights):
    
    # prepare
    values = np.array(values)
    weights = np.array(weights)
        
    # estimate mean
    weights_sum =  weights.sum()
    mean = (values*weights).sum() / weights_sum
   
    # estimate variance
    errors = (values-mean)**2
    variance = (errors*weights).sum() / weights_sum
        
    return (mean, variance)
rpoleski
  • 988
  • 5
  • 12