36

I'm trying to fit a Gaussian for my data (which is already a rough gaussian). I've already taken the advice of those here and tried curve_fit and leastsq but I think that I'm missing something more fundamental (in that I have no idea how to use the command). Here's a look at the script I have so far

import pylab as plb
import matplotlib.pyplot as plt

# Read in data -- first 2 rows are header in this example. 
data = plb.loadtxt('part 2.csv', skiprows=2, delimiter=',')

x = data[:,2]
y = data[:,3]
mean = sum(x*y)
sigma = sum(y*(x - mean)**2)

def gauss_function(x, a, x0, sigma):
    return a*np.exp(-(x-x0)**2/(2*sigma**2))
popt, pcov = curve_fit(gauss_function, x, y, p0 = [1, mean, sigma])
plt.plot(x, gauss_function(x, *popt), label='fit')

# plot data

plt.plot(x, y,'b')

# Add some axis labels

plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()

What I get from this is a gaussian-ish shape which is my original data, and a straight horizontal line.

enter image description here

Also, I'd like to plot my graph using points, instead of having them connected. Any input is appreciated!

Gabriel
  • 40,504
  • 73
  • 230
  • 404
Richard Hsia
  • 403
  • 1
  • 4
  • 5

7 Answers7

36

Here is corrected code:

import pylab as plb
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp

x = ar(range(10))
y = ar([0,1,2,3,4,5,4,3,2,1])

n = len(x)                          #the number of data
mean = sum(x*y)/n                   #note this correction
sigma = sum(y*(x-mean)**2)/n        #note this correction

def gaus(x,a,x0,sigma):
    return a*exp(-(x-x0)**2/(2*sigma**2))

popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma])

plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()

result:
enter image description here

Developer
  • 8,258
  • 8
  • 49
  • 58
  • And if you use `plt.plot(x,y,'b+',label='data')` they will be just points. – Developer Oct 06 '13 at 10:14
  • Thank you! What are the two corrections you added for by the way? – KRS-fun Feb 27 '14 at 22:46
  • 2
    This answer does not explain why the correction works better than the original definition. – strpeter Jul 20 '16 at 08:54
  • 1
    To make this work with a different dataset, I added `yn = max(y)`, `y /= yn`, then during plotting, I changed `y` to `y*yn`: `plt.plot(x, y*yn, ...)` . – EL_DON Apr 25 '17 at 18:24
  • @strpeter you would pay attention to the comments in the corrected code. it should be enough explanation. – Developer May 02 '17 at 05:13
  • 1
    @Developer: I guess what you want to do is dividing by `sum(y)` or equally e.g. `mean = np.average(x, weights=y)` which is equal to `5.0`. I ask myself why you divide by `n` which returns `12.0` - a value that is far from the real value. In a general case you cannot even be sure that this converges to the right values since it would be correct to divide by `sum(y)` which can be far from `n`. – strpeter Jun 14 '17 at 09:17
  • @EL_DON why does scaling y between 0 and 1 help? It works for me but I dont understand why its necessary – HereItIs Mar 18 '21 at 16:34
  • 1
    @HereItIs The fitting routine can internally hit a floating overflow (underflow) if the input data are extremely large (small). Normalizing to the maximum input value usually prevents this. – EL_DON Mar 25 '21 at 13:18
  • @Developer the comments in the code only mention where the fix happens, they do not explain anything at all. – BartoszKP Jan 03 '22 at 11:08
31

Explanation

You need good starting values such that the curve_fit function converges at "good" values. I can not really say why your fit did not converge (even though the definition of your mean is strange - check below) but I will give you a strategy that works for non-normalized Gaussian-functions like your one.

Example

The estimated parameters should be close to the final values (use the weighted arithmetic mean - divide by the sum of all values):

import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np

x = np.arange(10)
y = np.array([0, 1, 2, 3, 4, 5, 4, 3, 2, 1])

# weighted arithmetic mean (corrected - check the section below)
mean = sum(x * y) / sum(y)
sigma = np.sqrt(sum(y * (x - mean)**2) / sum(y))

def Gauss(x, a, x0, sigma):
    return a * np.exp(-(x - x0)**2 / (2 * sigma**2))

popt,pcov = curve_fit(Gauss, x, y, p0=[max(y), mean, sigma])

plt.plot(x, y, 'b+:', label='data')
plt.plot(x, Gauss(x, *popt), 'r-', label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()

I personally prefer using numpy.

Comment on the definition of the mean (including Developer's answer)

Since the reviewers did not like my edit on #Developer's code, I will explain for what case I would suggest an improved code. The mean of developer does not correspond to one of the normal definitions of the mean.

Your definition returns:

>>> sum(x * y)
125

Developer's definition returns:

>>> sum(x * y) / len(x)
12.5 #for Python 3.x

The weighted arithmetic mean:

>>> sum(x * y) / sum(y)
5.0

Similarly you can compare the definitions of standard deviation (sigma). Compare with the figure of the resulting fit:

Resulting fit

Comment for Python 2.x users

In Python 2.x you should additionally use the new division to not run into weird results or convert the the numbers before the division explicitly:

from __future__ import division

or e.g.

sum(x * y) * 1. / sum(y)
strpeter
  • 2,562
  • 3
  • 27
  • 48
  • I like the "weighted arithmetic mean". It makes the estimation of mean and std more accurate in some cases (for example, if you have a long left tail and short right tail). However, when I tried to google a theoretical explanation why "weighted arithmetic mean" is better, I found nothing. this is weird... – halfmoonhalf Mar 16 '23 at 05:55
  • Did you see the link to Wikipedia? - You can also look up the definition of the expectation value - the value, which is most likely to appear again from the existing data. – strpeter Apr 14 '23 at 08:03
6

You get a horizontal straight line because it did not converge.

Better convergence is attained if the first parameter of the fitting (p0) is put as max(y), 5 in the example, instead of 1.

user3406246
  • 69
  • 1
  • 1
4

After losing hours trying to find my error, the problem is your formula:

sigma = sum(y*(x-mean)**2)/n

This previous formula is wrong, the correct formula is the square root of this!;

sqrt(sum(y*(x-mean)**2)/n)

Hope this helps

asendjasni
  • 963
  • 1
  • 16
  • 36
James
  • 41
  • 1
  • 2
    There is still another error: It is not 1/n but 1/sum(y). Check out [my answer below](http://stackoverflow.com/a/38431524/2062965). – strpeter Jul 20 '16 at 08:02
1

There is another way of performing the fit, which is by using the 'lmfit' package. It basically uses the cuve_fit but is much better in fitting and offers complex fitting as well. Detailed step by step instructions are given in the below link. http://cars9.uchicago.edu/software/python/lmfit/model.html#model.best_fit

1
sigma = sum(y*(x - mean)**2)

should be

sigma = np.sqrt(sum(y*(x - mean)**2))
Cleb
  • 25,102
  • 20
  • 116
  • 151
gastro
  • 261
  • 1
  • 7
1

Actually, you do not need to do a first guess. Simply doing

import matplotlib.pyplot as plt  
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp

x = ar(range(10))
y = ar([0,1,2,3,4,5,4,3,2,1])

n = len(x)                          #the number of data
mean = sum(x*y)/n                   #note this correction
sigma = sum(y*(x-mean)**2)/n        #note this correction

def gaus(x,a,x0,sigma):
    return a*exp(-(x-x0)**2/(2*sigma**2))

popt,pcov = curve_fit(gaus,x,y)
#popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma])

plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()

works fine. This is simpler because making a guess is not trivial. I had more complex data and did not manage to do a proper first guess, but simply removing the first guess worked fine :)

P.S.: use numpy.exp() better, says a warning of scipy

Wendigo
  • 11
  • 2