0

As my title suggests, I'm trying to fit a Gaussian to some data and I'm just getting a straight line. I've been looking at these other discussion Gaussian fit for Python and Fitting a gaussian to a curve in Python which seem to suggest basically the same thing. I can make the code in those discussions work fine for the data they provide, but it won't do it for my data.

My code looks like this:

import pylab as plb
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp

y = y - y[0]    # to make it go to zero on both sides
x = range(len(y))

max_y = max(y)

n = len(y)
mean = sum(x*y)/n
sigma = np.sqrt(sum(y*(x-mean)**2)/n)
# Someone on a previous post seemed to think this needed to have the sqrt.
# Tried it without as well, made no difference. 

def gaus(x,a,x0,sigma):
    return a*exp(-(x-x0)**2/(2*sigma**2))

popt,pcov = curve_fit(gaus,x,y,p0=[max_y,mean,sigma])    
# It was suggested in one of the other posts I looked at to make the
# first element of p0 be the maximum value of y.
# I also tried it as 1, but that did not work either

plt.plot(x,y,'b:',label='data')
plt.plot(x,gaus(x,*popt),'r:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()

The data I am trying to fit is as follows:

y = array([  6.95301373e+12,   9.62971320e+12,   1.32501876e+13,
     1.81150568e+13,   2.46111132e+13,   3.32321345e+13,
     4.45978682e+13,   5.94819771e+13,   7.88394616e+13,
     1.03837779e+14,   1.35888594e+14,   1.76677210e+14,
     2.28196006e+14,   2.92781632e+14,   3.73133045e+14,
     4.72340762e+14,   5.93892782e+14,   7.41632194e+14,
     9.19750269e+14,   1.13278296e+15,   1.38551838e+15,
     1.68291212e+15,   2.02996957e+15,   2.43161742e+15,
     2.89259207e+15,   3.41725793e+15,   4.00937676e+15,
     4.67187762e+15,   5.40667931e+15,   6.21440313e+15,
     7.09421973e+15,   8.04366842e+15,   9.05855930e+15,
     1.01328502e+16,   1.12585509e+16,   1.24257598e+16,
     1.36226443e+16,   1.48356404e+16,   1.60496345e+16,
     1.72482199e+16,   1.84140400e+16,   1.95291969e+16,
     2.05757166e+16,   2.15360187e+16,   2.23933053e+16,
     2.31320228e+16,   2.37385276e+16,   2.42009864e+16,
     2.45114362e+16,   2.46427484e+16,   2.45114362e+16,
     2.42009864e+16,   2.37385276e+16,   2.31320228e+16,
     2.23933053e+16,   2.15360187e+16,   2.05757166e+16,
     1.95291969e+16,   1.84140400e+16,   1.72482199e+16,
     1.60496345e+16,   1.48356404e+16,   1.36226443e+16,
     1.24257598e+16,   1.12585509e+16,   1.01328502e+16,
     9.05855930e+15,   8.04366842e+15,   7.09421973e+15,
     6.21440313e+15,   5.40667931e+15,   4.67187762e+15,
     4.00937676e+15,   3.41725793e+15,   2.89259207e+15,
     2.43161742e+15,   2.02996957e+15,   1.68291212e+15,
     1.38551838e+15,   1.13278296e+15,   9.19750269e+14,
     7.41632194e+14,   5.93892782e+14,   4.72340762e+14,
     3.73133045e+14,   2.92781632e+14,   2.28196006e+14,
     1.76677210e+14,   1.35888594e+14,   1.03837779e+14,
     7.88394616e+13,   5.94819771e+13,   4.45978682e+13,
     3.32321345e+13,   2.46111132e+13,   1.81150568e+13,
     1.32501876e+13,   9.62971320e+12,   6.95301373e+12,
     4.98705540e+12])

I would show you what it looks like, but apparently I don't have enough reputation points...

Anyone got any idea why it's not fitting properly?

Thanks for your help :)

Community
  • 1
  • 1
user1153070
  • 139
  • 8

1 Answers1

1

The importance of the initial guess, p0 in curve_fit's default argument list, cannot be stressed enough.

Notice that the docstring mentions that

[p0] If None, then the initial values will all be 1

So if you do not supply it, it will use an initial guess of 1 for all parameters you're trying to optimize for. The choice of p0 affects the speed at which the underlying algorithm changes the guess vector p0 (ref. the documentation of least_squares).

When you look at the data that you have, you'll notice that the maximum and the mean, mu_0, of the Gaussian-like dataset y, are 2.4e16 and 49 respectively. With the peak value so large, the algorithm, would need to make drastic changes to its initial guess to reach that large value.

When you supply a good initial guess to the curve fitting algorithm, convergence is more likely to occur.

Using your data, you can supply a good initial guess for the peak_value, the mean and sigma, by writing them like this:

y = np.array([...])  # starting from the original dataset
x = np.arange(len(y))
peak_value = y.max()
mean = x[y.argmax()] # observation of the data shows that the peak is close to the center of the interval of the x-data
sigma = mean - np.where(y > peak_value * np.exp(-.5))[0][0] # when x is sigma in the gaussian model, the function evaluates to a*exp(-.5)
popt,pcov = curve_fit(gaus, x, y, p0=[peak_value, mean, sigma])
print(popt)  # prints: [  2.44402560e+16   4.90000000e+01   1.20588976e+01]

Note that in your code, for the mean you take sum(x*y)/n , which is strange, because this would modulate the gaussian by a polynome of degree 1 (it multiplies a gaussian with a monotonously increasing line of constant slope) before taking the mean. That will offset the mean value of y (in this case to the right). A similar remark can be made for your calculation of sigma.

Final remark: the histogram of y will not resemble a Gaussian, as y is already a Gaussian. The histogram will merely bin (count) values into different categories (answering the question "how many datapoints in y reach a value between [a, b]?").

Oliver W.
  • 13,169
  • 3
  • 37
  • 50