How can I maximize the Poissonian likelihood of a histogram given a fit curve with scipy/numpy?

Question

I have data in a python/numpy/scipy environment that needs to be fit to a probability density function. A way to do this is to create a histogram of the data and then fit a curve to this histogram. The method scipy.optimize.leastsq does this by minimizing the sum of (y - f(x))**2, where (x,y) would in this case be the histogram's bin centers and bin contents.

In statistical terms, this least-square maximizes the likelihood of obtaining that histogram by sampling each bin count from a gaussian centered around the fit function at that bin's position. You can easily see this: each term (y-f(x))**2 is -log(gauss(y|mean=f(x))), and the sum is the logarithm of the multiplying the gaussian likelihood for all the bins together.

That's however not always accurate: for the type of statistical data I'm looking at, each bin count would be the result of a Poissonian process, so I want to minimize (the logarithm of the product over all the bins (x,y) of) poisson(y|mean=f(x)). The Poissonian comes very close to the Gaussian distribution for large values of f(x), but if my histogram doesn't have as good statistics, the difference would be relevant and influencing the fit.

Not really my field, but can you reformulate the problem so that it could be solved by `scipy.optimize.fsolve`? — deinonychusaur, Aug 05 '14 at 17:44

score 0 · Answer 1 · edited May 23 '17 at 12:20

If I understood correctly, you have data and want to see whether or not some probability distribution fits your data.

Well, if that's the case - you need QQ-Plot. If that's the case, then take a look at this StackOverflow question-answer. However, that is about normal distribution function, and you need a code for Poisson distribution function. All you need to do is create some random data according to Poisson random function and test your samples against it. Here you can find an example of QQ-plot for Poisson distribution function. Here's the code from this web-site:

 #! /usr/bin/env python

  from pylab import *

  p = poisson(lam=10, size=4000)
  m = mean(p)
  s = std(p)
  n = normal(loc=m, scale=s, size=p.shape)

  a = m-4*s
  b = m+4*s

  figure()
  plot(sort(n), sort(p), 'o', color='0.85')
  plot([a,b], [a,b], 'k-')
  xlim(a,b)
  ylim(a,b)
  xlabel('Normal Distribution')
  ylabel('Poisson Distribution with $\lambda=10$')
  grid(True)
  savefig('qq.pdf')
  show()

How can I maximize the Poissonian likelihood of a histogram given a fit curve with scipy/numpy?

1 Answers1