I wanted to ask a question about a user's reply to other question, but for some reason the comment box isn't showing up. Sorry if I'm doing something wrong.
In any case, regarding this reply: https://stackoverflow.com/a/11507723/1950164
I have the following question: how can I use this code to fit different data to different functions? I have a similar problem as the one he solved, expect I want to fit the cumulative distribution. So I started trying to generalize the code. I did three modifications:
a) After the line where the histogram is calculated, I added
hist = numpy.cumsum(hist)
This transforms our distribution into a cumulative distribution
b) Instead of the gaussian function in the example, I defined a new function
def myerf(x, *p):
A, mu, sigma = p
return A/2. * (1+math.erf((x-mu)/(math.sqrt(2)*sigma)))
This is what the cumulative distribution of a gaussian should be.
c) Lastly, of course, I changed the curve_fit line to call my function:
coeff, var_matrix = curve_fit(myerf, bin_centres, hist, p0=p0)
This should be a trivial exercise, except it doesn't work. The program now returns the following error message:
bash-3.2$ python fitting.py
Traceback (most recent call last):
File "fitting.py", line 27, in <module>
coeff, var_matrix = curve_fit(myerf, bin_centres, hist, p0=p0)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 506, in curve_fit
res = leastsq(func, p0, args=args, full_output=1, **kw)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 348, in leastsq
m = _check_func('leastsq', 'func', func, x0, args, n)[0]
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 14, in _check_func
res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 418, in _general_function
return function(xdata, *params) - ydata
File "fitting.py", line 22, in myerf
return A/2. * (1+math.erf((x-mu)/(math.sqrt(2)*sigma)))
TypeError: only length-1 arrays can be converted to Python scalars
So what am I doing wrong?
Bonus: give me a reference which explains what that *p is in the argument of the function.
Thanks!
EDIT: I tried running the program with cumulative distribution data, but still calling the gaussian function. That works, you just get a bad fit. So the mistake should be somewhere in myerf function.
EDIT2: If I try substituting the myerf function's return with something simpler, like
return A + mu*x + sigma*x**2
then it works. So there must be something in that return that isn't doing what it's supposed to.
EDIT3: So, I tried using the error function from scipy instead of that from math, and it works now. I have no idea why it wasn't working before, but it's working now. So the code is:
import matplotlib
matplotlib.use('Agg')
import numpy, math
import pylab as pl
from scipy.optimize import curve_fit
from scipy.special import erf
# Define some test data which is close to Gaussian
data = numpy.random.normal(size=10000)
hist, bin_edges = numpy.histogram(data, density=True)
bin_centres = (bin_edges[:-1] + bin_edges[1:])/2
hist = numpy.cumsum(hist)
def myerf(x, *p):
A, mu, sigma = p
return A/2. * ( 1+erf(((x-mu)/(math.sqrt(2)*sigma))) )
# p0 is the initial guess for the fitting coefficients (A, mu and sigma above)
p0 = [1., 0., 1.]
coeff, var_matrix = curve_fit(myerf, bin_centres, hist, p0=p0)
# Get the fitted curve
hist_fit = myerf(bin_centres, *coeff)
pl.plot(bin_centres, hist, label='Test data')
pl.plot(bin_centres, hist_fit, label='Fitted data')
# Finally, lets get the fitting parameters, i.e. the mean and standard deviation:
print 'Fitted mean = ', coeff[1]
print 'Fitted standard deviation = ', coeff[2]
pl.savefig('fitting.png')
pl.show()