3

I have now multiple times stumbled upon that fitting in python with scipy.curve_fit is somehow a lot harder than with other tools like e.g. ROOT (https://root.cern.ch/)

For example, when fitting a gaussian, with scipy I mostly get a straight line: enter image description here

corresponding code:

def fit_gauss(y, x = None):
    n = len(y)  # the number of data
    if x is None:
       x = np.arange(0,n,1)
    mean = y.mean()
    sigma = y.std()

    def gauss(x, a, x0, sigma):
        return a * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))

    popt, pcov = curve_fit(gauss, x, y, p0=[max(y), mean, sigma])

    plt.plot(x, y, 'b+:', label='data')
    plt.plot(x, gauss(x, *popt), 'ro:', label='fit')
    plt.legend()
    plt.title('Gauss fit for spot')
    plt.xlabel('Pixel (px)')
    plt.ylabel('Intensity (a.u.)')
    plt.show()

Using ROOT, I get a perfect fit, without even giving start parameters: enter image description here

Again, the corresponding code:

import ROOT
import numpy as np

y = np.array([2., 2., 11., 0., 5., 7., 18., 12., 19., 20., 36., 11., 21., 8., 13., 14., 8., 3., 21., 0., 24., 0., 12., 0., 8., 11., 18., 0., 9., 21., 17., 21., 28., 36., 51., 36., 47., 69., 78., 73., 52., 81., 96., 71., 92., 70., 84.,72., 88., 82., 106., 101., 88., 74., 94., 80., 83., 70., 78., 85., 85., 56., 59., 56., 73., 33., 49., 50., 40., 22., 37., 26., 6., 11., 7., 26., 0., 3., 0., 0., 0., 0., 0., 3., 9., 0., 31., 0., 11., 0., 8., 0., 9., 18.,9., 14., 0., 0., 6., 0.])
x = np.arange(0,len(y),1)
#yerr= np.array([0.1,0.2,0.1,0.2,0.2])
graph = ROOT.TGraphErrors()
for i in range(len(y)):
    graph.SetPoint(i, x[i], y[i])
    #graph.SetPointError(i, yerr[i], yerr[i])
func = ROOT.TF1("Name", "gaus")
graph.Fit(func)

canvas = ROOT.TCanvas("name", "title", 1024, 768)
graph.GetXaxis().SetTitle("x") # set x-axis title
graph.GetYaxis().SetTitle("y") # set y-axis title
graph.Draw("AP")

Can someone explain to me, why the results differ that much? Is the implementation in scipy that bad / dependend on good start parameters? Is there any way around it? I need to process a lot of fits automatically, but don't have access to ROOT on the target computer, so it has to work with python only.

When taking the results from the ROOT fit and giving them to scipy as the start parameters, the fit works fine with scipy as well...

user3696412
  • 1,321
  • 3
  • 15
  • 33
  • 1
    I get a nice output for the data you provide in your second code sample (see my answer below). – Cleb Mar 16 '17 at 13:40

2 Answers2

2

Without actual data it is not that easy to reproduce your results but with artificially created noisy data it looks fine to me:

enter image description here

That is the code I am using:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# your gauss function
def gauss(x, a, x0, sigma):
    return a * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))

# create some noisy data
xdata = np.linspace(0, 4, 50)
y = gauss(xdata, 2.5, 1.3, 0.5)
y_noise = 0.4 * np.random.normal(size=xdata.size)
ydata = y + y_noise
# plot the noisy data
plt.plot(xdata, ydata, 'bo', label='data')

# do the curve fit using your idea for the initial guess
popt, pcov = curve_fit(gauss, xdata, ydata, p0=[ydata.max(), ydata.mean(), ydata.std()])

# plot the fit as well
plt.plot(xdata, gauss(xdata, *popt), 'r-', label='fit')

plt.show()

As you, I also use p0=[ydata.max(), ydata.mean(), ydata.std()] as an initial guess and that seems to work fine and robust for different noise strengths.

EDIT

I just realized that you actually provide data; then the result looks as follows:

enter image description here

Code:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit


def gauss(x, a, x0, sigma):
    return a * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))

ydata = np.array([2., 2., 11., 0., 5., 7., 18., 12., 19., 20., 36., 11., 21., 8., 13., 14., 8., 3., 21., 0., 24., 0., 12.,
0., 8., 11., 18., 0., 9., 21., 17., 21., 28., 36., 51., 36., 47., 69., 78., 73., 52., 81., 96., 71., 92., 70., 84.,72.,
88., 82., 106., 101., 88., 74., 94., 80., 83., 70., 78., 85., 85., 56., 59., 56., 73., 33., 49., 50., 40., 22., 37., 26.,
6., 11., 7., 26., 0., 3., 0., 0., 0., 0., 0., 3., 9., 0., 31., 0., 11., 0., 8., 0., 9., 18.,9., 14., 0., 0., 6., 0.])

xdata = np.arange(0, len(ydata), 1)

plt.plot(xdata, ydata, 'bo', label='data')

popt, pcov = curve_fit(gauss, xdata, ydata, p0=[ydata.max(), ydata.mean(), ydata.std()])
plt.plot(xdata, gauss(xdata, *popt), 'r-', label='fit')

plt.show()
Cleb
  • 25,102
  • 20
  • 116
  • 151
  • Works as advertised. Actually, I don't really know why my version did not work. I ended up simply creating a new file, copy-pasting everthing except the fitting and then added your answer for the fitting and now it works... Dunno what crazy quirk python had there... – user3696412 Mar 21 '17 at 13:56
  • @user3696412: Glad it is fixed now. :) I did not try to "repair" your code so I don't know either; did not see anything obvious wrong with it. – Cleb Mar 21 '17 at 14:07
1

You may not have really wanted to use ydata.mean() for the initial value of the Gaussian centroid, or ydata.std() for the initial value of the variance -- those are probably better guessed from xdata. I don't know if that is what caused the initial trouble.

You might find the lmfit library useful. This provides a way to turn your model function gauss into a Model class with a fit() method that uses named Parameters determined from your model function. Using it, your fit might look like:

import numpy as np
import matplotlib.pyplot as plt

from lmfit import Model

def gauss(x, a, x0, sigma):
    return a * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))

ydata = np.array([2., 2., 11., 0., 5., 7., 18., 12., 19., 20., 36., 11., 21., 8., 13., 14., 8., 3., 21., 0., 24., 0., 12., 0., 8., 11., 18., 0., 9., 21., 17., 21., 28., 36., 51., 36., 47., 69., 78., 73., 52., 81., 96., 71., 92., 70., 84.,72., 88., 82., 106., 101., 88., 74., 94., 80., 83., 70., 78., 85., 85., 56., 59., 56., 73., 33., 49., 50., 40., 22., 37., 26., 6., 11., 7., 26., 0., 3., 0., 0., 0., 0., 0., 3., 9., 0., 31., 0., 11., 0., 8., 0., 9., 18.,9., 14., 0., 0., 6., 0.])

xdata = np.arange(0, len(ydata), 1)

# wrap your gauss function into a Model
gmodel = Model(gauss)
result = gmodel.fit(ydata, x=xdata, 
                    a=ydata.max(), x0=xdata.mean(), sigma=xdata.std())

print(result.fit_report())

plt.plot(xdata, ydata, 'bo', label='data')
plt.plot(xdata, result.best_fit, 'r-', label='fit')
plt.show()

There are several additional features. For example, you might want to see the confidence in the best fit, which would be (in master version, soon-to-be-released):

# add estimated band of uncertainty:
dely = result.eval_uncertainty(sigma=3)
plt.fill_between(xdata, result.best_fit-dely, result.best_fit+dely, color="#ABABAB")
plt.show()

to give: enter image description here

M Newville
  • 7,486
  • 2
  • 16
  • 29
  • Nice alternative to my answer (upvoted). I like the uncertainty part (as `lmfit` in general; look forward to the new release!). What exactly does the `sigma=3` part do? – Cleb Mar 16 '17 at 22:09