You could expand the values to the full set with numpy.repeat
, and use scipy.stats.norm.fit
:
In [54]: import numpy as np
In [55]: from scipy.stats import norm
In [56]: values = [1, 2, 3, 4, 5]
In [57]: counts = [4, 3, 6, 1, 3]
In [58]: full_values = np.repeat(values, counts)
In [59]: full_values
Out[59]: array([1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 5, 5])
In [60]: norm.fit(full_values) # Estimate mean and std. dev.
Out[60]: (2.7647058823529411, 1.3516617991854185)
scipy.stats.norm.fit
computes the maximum likelihood estimates of the parameters. For the normal distribution, these are just the sample mean and the square root of the (biased) sample variance.
As far as I know, the only relevant weighted statistical function in numpy or scipy is numpy.average
. You can do the computation yourself with numpy.average
, using counts
as the weights
argument.
In [62]: sample_mean = np.average(values, weights=counts)
In [63]: sample_mean
Out[63]: 2.7647058823529411
In [64]: sample_var = np.average((values - sample_mean)**2, weights=counts)
In [65]: sample_var
Out[65]: 1.8269896193771626
In [66]: sample_std = np.sqrt(sample_var)
In [67]: sample_std
Out[67]: 1.3516617991854185
Note that statistics.stdev
is based on the unbiased sample variance. If that's what you want, you can adjust the scaling by multiplying the biased sample variance by sum(counts)/(sum(counts) - 1)
:
In [79]: n = sum(counts)
In [80]: sample_var = n/(n-1)*np.average((values - sample_mean)**2, weights=counts)
In [81]: sample_var
Out[81]: 1.9411764705882353
In [82]: sample_std = np.sqrt(sample_var)
In [83]: sample_std
Out[83]: 1.3932610920384718