Error when obtaining Gaussian fit for graph

Question

I am not an experienced programmer, and needed to fit my data values into a Gaussian graph. The code below was from Gaussian fit for Python.

Using Anaconda, this error message was obtained:

runfile('D:/Anaconda3/gaussian.py', wdir='D:/Anaconda3')

C:\Users\lion\Anaconda3\lib\sitepackages\scipy\optimize\minpack.py:779:OptimizeWarning: Covariance of the parameters could not be estimated category=OptimizeWarning).

The fit gaussian graph was also not as expected; a horizontal curve instead of a gaussian fit curve.

Graph Image:

Any help would be appreciated!

Code Used:

import pylab as plb
import matplotlib.pyplot as plt

from scipy.optimize import curve_fit
from scipy import asarray as ar,exp


x = ar(range(399))
y = ar([1, 0, 1, 0, 2, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 2, 1, 0, 2, 1, 1, 0, 1, 0, 1, 0, 0, 2, 0, 2, 
        3, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 2, 0, 3, 3, 0, 1, 2, 1, 1, 2, 3, 1, 
        4, 1, 1, 1, 0, 1, 0, 3, 4, 0, 1, 3, 0, 2, 0, 3, 0, 0, 0, 3, 2, 0, 2, 0, 0, 1, 2, 0, 0, 0, 
        3, 2, 1, 0, 1, 3, 4, 2, 4, 1, 2, 1, 1, 2, 0, 2, 2, 6, 2, 4, 2, 0, 1, 2, 2, 3, 4, 6, 2, 3, 
        2, 4, 1, 4, 9, 6, 6, 4, 5, 6, 3, 7, 8, 9, 7, 8, 8, 4, 10, 10, 12, 9, 18, 18, 16, 14, 13, 13, 
        15, 17, 16, 26, 24, 37, 34, 36, 40, 48, 52, 52, 50, 56, 68, 90, 71, 107, 93, 117, 134, 207, 
        200, 227, 284, 287, 337, 379, 449, 471, 626, 723, 848, 954, 1084, 1296, 1481, 1676, 1898, 2024, 
        2325, 2692, 3110, 3384, 3762, 4215, 4559, 5048, 5655, 6092, 6566, 6936, 7513, 8052, 8414, 9016, 
        9303, 9598, 9775, 10100, 10265, 10651, 10614, 10755, 10439, 10704, 10233, 10086, 9696, 9467, 9156, 
        8525, 8200, 7609, 7156, 6678, 6160, 5638, 5227, 4574, 4265, 3842, 3380, 3029, 2767, 2512, 2018, 1856, 1645, 
        1463, 1253, 1076, 943, 787, 711, 588, 512, 448, 361, 304, 303, 251, 190, 185, 154, 134, 114, 105, 
        86, 88, 83, 79, 50, 60, 49, 28, 33, 37, 28, 31, 22, 
        14, 26, 19, 17, 15, 9, 17, 13, 11, 11, 12, 18, 8, 6, 9, 6, 3, 6, 6, 6, 6, 11, 9, 15, 
        3, 3, 1, 2, 3, 2, 6, 3, 4, 3, 4, 5, 3, 1, 1, 2, 1, 0, 4, 3, 2, 3, 1, 3, 3, 4, 0, 3, 5, 0, 
        3, 1, 2, 0, 2, 2, 1, 0, 5, 1, 3, 0, 0, 3, 0, 1, 3, 0, 1, 0, 2, 4, 0, 0, 0, 0, 0, 1, 0, 2, 
        0, 1, 1, 0, 1, 0, 0, 4, 0, 0, 0, 2, 0, 3, 0, 2, 1, 2, 2, 0, 0, 0, 1, 4, 1, 0, 1, 2, 0, 1, 
        1, 1, 1, 2, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1])

n = len(x)                          #the number of data
mean = sum(x*y)/n                   #note this correction
sigma = sum(y*(x-mean)**2)/n        #note this correction

def gaus(x,a,x0,sigma):
    return a*exp(-(x-x0)**2/(2*sigma**2))

popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma])

plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Intensity (Counts)')

plt.show()

I'd have to take a closer look when I'm at a computer, but just looking at your plot, you might try bounding your data more tightly to cut off those long, nearly flat tails which are probably what's throwing off the covariance estimate. — Iguananaut, Oct 14 '17 at 21:48
Unrelated, but some tips on using numpy: instead of importing `asarray` from the scipy module (as really it's party of Numpy) use numpy directly: `import numpy as np`. Instead of `ar(range(N))` you can use just `np.arange(N)` to make a monotonically increasing array. Also use `np.sum()` instead of the Python built-in sum. For a few hundred points it doesn't much matter, but beyond that the difference can be orders of magnitude. — Iguananaut, Oct 14 '17 at 21:53
I see now you were just copying someone else's code. To be clear, all I mean by "bound" is to try passing, say, `x[100:300]` to `curve_fit` and similarly for `y`. — Iguananaut, Oct 14 '17 at 21:59

J_H · Answer 1 · 2017-10-15T02:59:13.950

0

When fitting a Gaussian, correctly computing μ and σ is kind of a big deal. You cited your reference (thank you!), but didn't follow it. Please import math and substitute these two lines into your code:

mean = sum(x * y) / sum(y)
sigma = math.sqrt(sum(y * (x - mean)**2) / sum(y))

(It turns out if mean is any number between -7200 and 10,000 the fit still converges nicely, and similarly for positive sigmas less than 805.)

edited Oct 15 '17 at 02:59

answered Oct 15 '17 at 02:42

J_H

17,926
4
24
44

Use `np.sqrt` not `math.sqrt` (tho in fairness it doesn't really matter here) – Iguananaut Oct 15 '17 at 22:02
Yah, I tried them both, and opted for `math` in the interest of simplicity, since it's just a scalar. – J_H Oct 15 '17 at 22:58

score 0 · Answer 2 · answered Oct 16 '17 at 01:01

You may find lmfit (http://lmfit.github.io/lmfit-py/) useful for this sort of problem. It has builtin models for common peak shapes like Gaussian and simplifies many curve-fitting tasks. You example would look like this (skipping the data):

import matplotlib.pyplot as plt

from lmfit.models import GaussianModel

gmodel = GaussianModel()
params = gmodel.guess(y, x=x)
result = gmodel.fit(y, params, x=x)

print(result.fit_report())

plt.plot(x, y,'b+:',label='data')
plt.plot(x, result.best_fit, 'ro:', label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Intensity (Counts)')
plt.show()

which would print out the fit report of

[[Model]]
    Model(gaussian)
[[Fit Statistics]]
    # function evals   = 23
    # data points      = 399
    # variables        = 3
    chi-square         = 1130482.300
    reduced chi-square = 2854.753
    Akaike info crit   = 3177.728
    Bayesian info crit = 3189.695
[[Variables]]
    sigma:       13.0637138 +/- 0.019425 (0.15%) (init= 14.5)
    center:      210.606022 +/- 0.019425 (0.01%) (init= 210.5)
    amplitude:   3.4580e+05 +/- 445.3134 (0.13%) (init= 467842.5)
    fwhm:        30.7626945 +/- 0.045744 (0.15%)  == '2.3548200*sigma'
    height:      10560.0217 +/- 13.59907 (0.13%)  == '0.3989423*amplitude/max(1.e-15, sigma)'
[[Correlations]] (unreported correlations are <  0.100)
    C(sigma, amplitude)          =  0.577

and produce a plot showing a good fit.

Note that the definition of Gaussian is slightly different, and your a would correspond to the value of height listed above. The value listed as amplitude would be the integrated area.

Appreciate your alternative solution! – Fad Lion Oct 20 '17 at 13:16 — Fad Lion, Oct 20 '17 at 13:16

Error when obtaining Gaussian fit for graph

2 Answers2