0

I need to interpolate data coming from an instrument using a gaussian fit. To this end I thought about using the curve_fit function from scipy. Since I'd like to test this functionality on fake data before trying it on the instrument I wrote the following code to generate noisy gaussian data and to fit it:

from scipy.optimize import curve_fit
import numpy
import pylab

# Create a gaussian function

def gaussian(x, a, b, c):
    val = a * numpy.exp(-(x - b)**2 / (2*c**2))
    return val

# Generate fake data.
zMinEntry = 80.0*1E-06
zMaxEntry = 180.0*1E-06
zStepEntry = 0.2*1E-06

x = numpy.arange(zMinEntry, 
                 zMaxEntry, 
                 zStepEntry, 
                 dtype = numpy.float64)    

n = len(x)                 

meanY = zMinEntry + (zMaxEntry - zMinEntry)/2
sigmaY = 10.0E-06

a = 1.0/(sigmaY*numpy.sqrt(2*numpy.pi))
y = gaussian(x, a, meanY, sigmaY) + a*0.1*numpy.random.normal(0, 1, size=len(x))


# Fit
popt, pcov = curve_fit(gaussian, x, y)

# Print results
print("Scale =  %.3f +/- %.3f" % (popt[0], numpy.sqrt(pcov[0, 0])))
print("Offset = %.3f +/- %.3f" % (popt[1], numpy.sqrt(pcov[1, 1])))
print("Sigma =  %.3f +/- %.3f" % (popt[2], numpy.sqrt(pcov[2, 2])))


pylab.plot(x, y, 'ro')
pylab.plot(x, gaussian(x, popt[0], popt[1], popt[2]))
pylab.grid(True)
pylab.show()

Unfortunately this does not work properly, the output of the code is the following:

Scale =  6174.816 +/- 7114424813.672
Offset = 429.319 +/- 3919751917.830
Sigma =  1602.869 +/- 17923909301.176

And the plotted result is (blue is the fit function, red dots is the noisy input data):

enter image description here

I also tried to look at this answer, but couldn't figure out where my problem is. Am I missing something here? Or am I using the curve_fit function in the wrong way? Thanks in advance!

Community
  • 1
  • 1
toti08
  • 2,448
  • 5
  • 24
  • 36
  • You need to provide an initial guess for the parameters (4th argument of curve_fit). – Christian K. Dec 09 '15 at 11:14
  • You might want to use some higher level fit packages like [peak-o-mat](http://lorentz.sf.net) which can be used as [library](http://lorentz.sourceforge.net/scripting_from_scratch.html) but also has a GUI. – Christian K. Dec 09 '15 at 11:22
  • Thanks, but I'd prefer not to install other libraries, this goes into a SW package that should be run on different computers that might not have that specific package. – toti08 Dec 09 '15 at 12:36
  • you could take a look at `scipy.stats.gaussian_kde` – Moritz Dec 09 '15 at 13:12

3 Answers3

2

I agree with Olaf in so far as it is a question of scale. The optimal parameters differ by many orders of magnitude. However, scaling the parameters with which you generated your toy data does not seem to solve the problem for your actual application. curve_fit uses lestsq, which numerically approximates the Jacobian, where numerical problems arise because of the differences in scale (try to use the full_output keyword in curve_fit).

In my experience it is often best to use fmin which does not rely on approximated derivatives but uses only function values. You now have to write your own least-squares function that is to be optimized.

Starting values are still important. In your case you can make sufficiently good guesses by taking the maximum amplitude for a and the corresponding x-values for band c.

In code, it looks like this:

from scipy.optimize import curve_fit,fmin
import numpy
import pylab

# Create a gaussian function

def gaussian(x, a, b, c):
    val = a * numpy.exp(-(x - b)**2 / (2*c**2))
    return val

# Generate fake data.
zMinEntry = 80.0*1E-06
zMaxEntry = 180.0*1E-06
zStepEntry = 0.2*1E-06

x = numpy.arange(zMinEntry, 
                 zMaxEntry, 
                 zStepEntry, 
                 dtype = numpy.float64)    

n = len(x)                 

meanY = zMinEntry + (zMaxEntry - zMinEntry)/2
sigmaY = 10.0E-06

a = 1.0/(sigmaY*numpy.sqrt(2*numpy.pi))
y = gaussian(x, a, meanY, sigmaY) + a*0.1*numpy.random.normal(0, 1, size=len(x))

print a, meanY, sigmaY
# estimate starting values from the data
a = y.max()
b = x[numpy.argmax(a)]
c = b

# define a least squares function to optimize
def minfunc(params):
    return sum((y-gaussian(x,params[0],params[1],params[2]))**2)

# fit
popt = fmin(minfunc,[a,b,c])

# Print results
print("Scale =  %.3f" % (popt[0]))
print("Offset = %.3f" % (popt[1]))
print("Sigma =  %.3f" % (popt[2]))


pylab.plot(x, y, 'ro')
pylab.plot(x, gaussian(x, popt[0], popt[1], popt[2]),lw = 2)
pylab.xlim(x.min(),x.max())
pylab.grid(True)
pylab.show()

enter image description here

thomas
  • 1,773
  • 10
  • 14
1

Looks like some numerical instabilities are creeping into the optimizer. Try scaling the data. With the following data:

zMinEntry = 80.0*1E-06 * 1000
zMaxEntry = 180.0*1E-06 * 1000
zStepEntry = 0.2*1E-06 * 1000
sigmaY = 10.0E-06 * 1000

I get estimates of

Scale =  39.697 +/- 0.526
Offset = 0.130 +/- 0.000
Sigma =  -0.010 +/- 0.000

Compare that to the true values:

Scale = 39.894228
Offset = 0.13
Sigma = 0.01

The minus sign of sigma can of course be ignored.

This gives the following plot

enter image description here

Olaf
  • 415
  • 3
  • 12
  • Thanks, this works. I'm still wondering why it fails on smaller numbers, but I guess I could use your approach as a workaround! – toti08 Dec 09 '15 at 12:37
1

As I said in a comment, if you provide a reasonable initial guess, the fit succeeds, i.e. call curve_fit like that:

popt, pcov = curve_fit(gaussian, x, y, [50000,0.00012,0.00002])

graph

Christian K.
  • 2,785
  • 18
  • 40