1

I wanted to ask a question about a user's reply to other question, but for some reason the comment box isn't showing up. Sorry if I'm doing something wrong.

In any case, regarding this reply: https://stackoverflow.com/a/11507723/1950164

I have the following question: how can I use this code to fit different data to different functions? I have a similar problem as the one he solved, expect I want to fit the cumulative distribution. So I started trying to generalize the code. I did three modifications:

a) After the line where the histogram is calculated, I added

hist = numpy.cumsum(hist)

This transforms our distribution into a cumulative distribution

b) Instead of the gaussian function in the example, I defined a new function

def myerf(x, *p):
    A, mu, sigma = p
    return A/2. * (1+math.erf((x-mu)/(math.sqrt(2)*sigma)))

This is what the cumulative distribution of a gaussian should be.

c) Lastly, of course, I changed the curve_fit line to call my function:

coeff, var_matrix = curve_fit(myerf, bin_centres, hist, p0=p0)

This should be a trivial exercise, except it doesn't work. The program now returns the following error message:

bash-3.2$ python fitting.py
Traceback (most recent call last):
  File "fitting.py", line 27, in <module>
    coeff, var_matrix = curve_fit(myerf, bin_centres, hist, p0=p0)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 506, in curve_fit
    res = leastsq(func, p0, args=args, full_output=1, **kw)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 348, in leastsq
    m = _check_func('leastsq', 'func', func, x0, args, n)[0]
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 14, in _check_func
    res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 418, in _general_function
    return function(xdata, *params) - ydata
  File "fitting.py", line 22, in myerf
    return A/2. * (1+math.erf((x-mu)/(math.sqrt(2)*sigma)))
TypeError: only length-1 arrays can be converted to Python scalars

So what am I doing wrong?

Bonus: give me a reference which explains what that *p is in the argument of the function.

Thanks!

EDIT: I tried running the program with cumulative distribution data, but still calling the gaussian function. That works, you just get a bad fit. So the mistake should be somewhere in myerf function.

EDIT2: If I try substituting the myerf function's return with something simpler, like

return A + mu*x + sigma*x**2

then it works. So there must be something in that return that isn't doing what it's supposed to.

EDIT3: So, I tried using the error function from scipy instead of that from math, and it works now. I have no idea why it wasn't working before, but it's working now. So the code is:

import matplotlib
matplotlib.use('Agg')
import numpy, math
import pylab as pl
from scipy.optimize import curve_fit
from scipy.special import erf

# Define some test data which is close to Gaussian
data = numpy.random.normal(size=10000)

hist, bin_edges = numpy.histogram(data, density=True)
bin_centres = (bin_edges[:-1] + bin_edges[1:])/2
hist = numpy.cumsum(hist)

def myerf(x, *p):
    A, mu, sigma = p
    return A/2. * (  1+erf(((x-mu)/(math.sqrt(2)*sigma)))  )

# p0 is the initial guess for the fitting coefficients (A, mu and sigma above)
p0 = [1., 0., 1.]

coeff, var_matrix = curve_fit(myerf, bin_centres, hist, p0=p0)

# Get the fitted curve
hist_fit = myerf(bin_centres, *coeff)

pl.plot(bin_centres, hist, label='Test data')
pl.plot(bin_centres, hist_fit, label='Fitted data')

# Finally, lets get the fitting parameters, i.e. the mean and standard deviation:
print 'Fitted mean = ', coeff[1]
print 'Fitted standard deviation = ', coeff[2]

pl.savefig('fitting.png')
pl.show()
Community
  • 1
  • 1

1 Answers1

0

Unlike the math functions, the numpy functions do accept vector input:

>>> import numpy, math
>>> numpy.exp([4,5])
array([  54.59815003,  148.4131591 ])
>>> math.exp([4,5])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: a float is required
Wilbert
  • 1,618
  • 15
  • 13