1

I know how to fit the data entering an histogram with a normal distribution using the SCipy library (Fitting a histogram with python) but how could I do the same if on top of having data I have an array of weights having the same dimension. Is there a proper function for that or should I create a second array fed by the data and weighting it myself?

Cheers.

Edit:

This is pretty much already answered here:

Weighted standard deviation in NumPy?

Community
  • 1
  • 1
Liam
  • 593
  • 6
  • 15

2 Answers2

0

just use the weights paramater on scipy.histogram and pass in your array of weights:

scipy.stats.histogram(a, numbins=10, defaultlimits=None, weights=None, printextras=False)

from the docs:

weights : array_like, optional The weights for each value in a. Default is None, which gives each value a weight of 1.0


NOTE: as of v1.0, scipy does not have the function histogram, but as of v1.11 histogram appears in numpy, with a similar (but not identical) call signature that includes the weights= argument:

numpy.histogram(a, bins=10, range=None, normed=False, weights=None, density=None)
Bonlenfum
  • 19,101
  • 2
  • 53
  • 56
agconti
  • 17,780
  • 15
  • 80
  • 114
  • Hum, I've been using it for plotting weighted data but I'd like to fit it taking care of the weights too. Though, it's not possible using the scipy.stats.norm.fit() methods, it doesn't seem to take weight parameter additionally. – Liam Jan 11 '14 at 15:22
  • If your looking to have your weights generated programmaticaly, maybe take a look at this http://stackoverflow.com/questions/11373192/generating-discrete-random-variables-with-various-weights-using-scipy-or-numpy – agconti Jan 11 '14 at 15:25
  • Euh, again it doesn't seem to be what I'm looking for, I want an estimate of the mean and variance of a weighted histogram (and also how much it looks like a normal distribution) not generating sample from it. – Liam Jan 11 '14 at 15:30
  • "How much it looks like a normal distribution" doesn't really make any sense. You check the mean of your sample and its variance to see if it approximates normal, but this has nothing to do with a histogram. The histogram is just a visual representation. You can use a [Kernal Density Estimation](http://glowingpython.blogspot.com/2012/08/kernel-density-estimation-with-scipy.html) if you want an easier way to view your sample compared to normal. – agconti Jan 11 '14 at 15:42
  • as for calculating the weighted mean and variance, the crux of your question it seems, I'm sure you can imagine how to do it in python, there inst a function for such a basic calculation. Then just use a [normality test](http://en.wikipedia.org/wiki/Normality_test) – agconti Jan 11 '14 at 15:45
  • @agconti "How much it looks like a normal distribution" makes **A LOT** of sense: http://en.wikipedia.org/wiki/Normality_test – Jaime Jan 11 '14 at 18:14
  • obviously I'm speaking in a programmatic sense, ie. creating an algorithm to graphically analyze your hist vs one based on a normal dist wouldn't be a sensible approach when compared to just computationally analyzing the numbers. Even in the visual case, its the data points themselves your after, so when you have all the power of scipy and numpy at your finger tips, why resort to only visual abstraction of the data? either way, this is just some advice, and whichever way you prefer is ultimately better for you. – agconti Jan 11 '14 at 19:39
  • also your solution is one that I suggested in my comments, so I think you know what I mean, when I said that "How much it looks like a normal distribution" isnt the best approach. – agconti Jan 11 '14 at 19:43
0

If you are looking for a Normal distribution N(mu, sigma) you can calculate exactly mu and sigma from the input data.

For example: X = x1,...,xN are the values and W = w1,..., wN their weights

mu = sum (X * W) / sum(W)
sigma = np.sqrt (sum (W * (X- mu)**2) / sum(W))

If you are to fit another kind of distribution, I suggested an answer here using OpenTURNS library.

Jean A.
  • 291
  • 1
  • 17