6

I've been surfing but haven't found the correct method to do the following.

I have a histogram done with matplotlib:

hist, bins, patches = plt.hist(distance, bins=100, normed='True')

From the plot, I can see that the distribution is more or less an exponential (Poisson distribution). How can I do the best fitting, taking into account my hist and bins arrays?

UPDATE

I am using the following approach:

x = np.float64(bins) # Had some troubles with data types float128 and float64
hist = np.float64(hist)
myexp=lambda x,l,A:A*np.exp(-l*x)
popt,pcov=opt.curve_fit(myexp,(x[1:]+x[:-1])/2,hist)

But I get

---> 41 plt.plot(stats.expon.pdf(np.arange(len(hist)),popt),'-')

ValueError: operands could not be broadcast together with shapes (100,) (2,)
user2820579
  • 3,261
  • 7
  • 30
  • 45
  • Possible duplicate of http://stackoverflow.com/questions/25828184/fitting-to-poisson-histogram or this one http://stackoverflow.com/questions/6620471/fitting-empirical-distribution-to-theoretical-ones-with-scipy-python ? – Ed Smith Nov 19 '15 at 18:35
  • I am not sure why you would want to use the histogram for that. All common distributions can be fixed by a couple of shape/location parameters. These can usually be estimated very efficiently from the data itself. – cel Nov 19 '15 at 18:36
  • I answered a similar question [yesterday](http://stackoverflow.com/questions/33767491/fitting-a-distribution-given-the-histogram-using-scipy/33768053#33768053), you can even find there how to use a fitting model of your own. You should take as input `x=(bins[1:]+bins[:-1])/2; y=hist` for the fitting procedure. @cel: for noisy data, a least-squares fit can be much more reliable than the raw estimate of the moments of the distribution.([citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed)) – Andras Deak -- Слава Україні Nov 19 '15 at 18:51
  • I don't understand too much what you give there. I am trying to use `param=spy.expon.fit(distance); a = np.linspace(0,bins[-1],1000, dtype='f'); pdf_fitted=spy.expon.pdf(a,loc=param[0],scale=param[1]);plt.plot(a,pdf_fitted,'r-')` – user2820579 Nov 19 '15 at 19:02
  • That might work, but that's another approach. My linked answer is based on `spy.optimize.curve_fit`, and works with the histogram rather than the raw data (as per your question). For this you'd first define a fitting model `myexp=lambda x,l,A:A*np.exp(-l*x)`, then use it as `popt,pcov=spy.optimize.curve_fit(myexp,bins[1:]+bins[:-1])/2,hist)`. Then `popt` contains `(l,A)`, i.e. the parameter of the exponential distribution and the prefactor for fitting. Does this make more sense? – Andras Deak -- Слава Україні Nov 19 '15 at 19:14
  • Yes, that is true about the approach I give before. This is so far I can get (updating the question). – user2820579 Nov 19 '15 at 19:24
  • 1
    Don't `plt.plot(stats.expon.pdf(np.arange(len(hist)),popt),'-')`, rather `plt.plot((x[1:]+x[:-1])/2,myexp((x[1:]+x[:-1])/2,*popt),'-')` (or with any `x` array you prefer). – Andras Deak -- Слава Україні Nov 19 '15 at 19:27
  • Okey, thank you so much. – user2820579 Nov 19 '15 at 19:31
  • [Here are all the `scipy.stats` distributions PDFs with example code.](http://stackoverflow.com/a/37559471/2087463) – tmthydvnprt Jun 01 '16 at 12:30

1 Answers1

18

What you described is a form of exponential distribution, and you want to estimate the parameters of the exponential distribution, given the probability density observed in your data. Instead of using non-linear regression method (which assumes the residue errors are Gaussian distributed), one correct way is arguably a MLE (maximum likelihood estimation).

scipy provides a large number of continuous distributions in its stats library, and the MLE is implemented with the .fit() method. Of course, exponential distribution is there:

In [1]:

import numpy as np
import scipy.stats as ss
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
#generate data 
X = ss.expon.rvs(loc=0.5, scale=1.2, size=1000)

#MLE
P = ss.expon.fit(X)
print P
(0.50046056920696858, 1.1442947648425439)
#not exactly 0.5 and 1.2, due to being a finite sample

In [3]:
#plotting
rX = np.linspace(0,10, 100)
rP = ss.expon.pdf(rX, *P)
#Yup, just unpack P with *P, instead of scale=XX and shape=XX, etc.
In [4]:

#need to plot the normalized histogram with `normed=True`
plt.hist(X, normed=True)
plt.plot(rX, rP)
Out[4]:

enter image description here

Your distance will replace X here.

CT Zhu
  • 52,648
  • 17
  • 120
  • 133