Gaussian fit for Python

Question

I'm trying to fit a Gaussian for my data (which is already a rough gaussian). I've already taken the advice of those here and tried curve_fit and leastsq but I think that I'm missing something more fundamental (in that I have no idea how to use the command). Here's a look at the script I have so far

import pylab as plb
import matplotlib.pyplot as plt

# Read in data -- first 2 rows are header in this example. 
data = plb.loadtxt('part 2.csv', skiprows=2, delimiter=',')

x = data[:,2]
y = data[:,3]
mean = sum(x*y)
sigma = sum(y*(x - mean)**2)

def gauss_function(x, a, x0, sigma):
    return a*np.exp(-(x-x0)**2/(2*sigma**2))
popt, pcov = curve_fit(gauss_function, x, y, p0 = [1, mean, sigma])
plt.plot(x, gauss_function(x, *popt), label='fit')

# plot data

plt.plot(x, y,'b')

# Add some axis labels

plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()

What I get from this is a gaussian-ish shape which is my original data, and a straight horizontal line.

Also, I'd like to plot my graph using points, instead of having them connected. Any input is appreciated!

You are missing some or your imports. `mean` is the sum of products so needs to be divided by `len(x)` — Steve Barnes, Oct 06 '13 at 07:28

score 36 · Accepted Answer · answered Oct 06 '13 at 10:13

36

Here is corrected code:

import pylab as plb
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp

x = ar(range(10))
y = ar([0,1,2,3,4,5,4,3,2,1])

n = len(x)                          #the number of data
mean = sum(x*y)/n                   #note this correction
sigma = sum(y*(x-mean)**2)/n        #note this correction

def gaus(x,a,x0,sigma):
    return a*exp(-(x-x0)**2/(2*sigma**2))

popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma])

plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()

result:
enter image description here

answered Oct 06 '13 at 10:13

Developer

8,258
8
49
58

And if you use `plt.plot(x,y,'b+',label='data')` they will be just points. – Developer Oct 06 '13 at 10:14
Thank you! What are the two corrections you added for by the way? – KRS-fun Feb 27 '14 at 22:46
2

This answer does not explain why the correction works better than the original definition. – strpeter Jul 20 '16 at 08:54
1

To make this work with a different dataset, I added `yn = max(y)`, `y /= yn`, then during plotting, I changed `y` to `y*yn`: `plt.plot(x, y*yn, ...)` . – EL_DON Apr 25 '17 at 18:24
@strpeter you would pay attention to the comments in the corrected code. it should be enough explanation. – Developer May 02 '17 at 05:13
1

@Developer: I guess what you want to do is dividing by `sum(y)` or equally e.g. `mean = np.average(x, weights=y)` which is equal to `5.0`. I ask myself why you divide by `n` which returns `12.0` - a value that is far from the real value. In a general case you cannot even be sure that this converges to the right values since it would be correct to divide by `sum(y)` which can be far from `n`. – strpeter Jun 14 '17 at 09:17
@EL_DON why does scaling y between 0 and 1 help? It works for me but I dont understand why its necessary – HereItIs Mar 18 '21 at 16:34
1

@HereItIs The fitting routine can internally hit a floating overflow (underflow) if the input data are extremely large (small). Normalizing to the maximum input value usually prevents this. – EL_DON Mar 25 '21 at 13:18
@Developer the comments in the code only mention where the fix happens, they do not explain anything at all. – BartoszKP Jan 03 '22 at 11:08

strpeter · Answer 2 · 2018-02-14T12:08:13.863

Explanation

You need good starting values such that the curve_fit function converges at "good" values. I can not really say why your fit did not converge (even though the definition of your mean is strange - check below) but I will give you a strategy that works for non-normalized Gaussian-functions like your one.

Example

The estimated parameters should be close to the final values (use the weighted arithmetic mean - divide by the sum of all values):

import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np

x = np.arange(10)
y = np.array([0, 1, 2, 3, 4, 5, 4, 3, 2, 1])

# weighted arithmetic mean (corrected - check the section below)
mean = sum(x * y) / sum(y)
sigma = np.sqrt(sum(y * (x - mean)**2) / sum(y))

def Gauss(x, a, x0, sigma):
    return a * np.exp(-(x - x0)**2 / (2 * sigma**2))

popt,pcov = curve_fit(Gauss, x, y, p0=[max(y), mean, sigma])

plt.plot(x, y, 'b+:', label='data')
plt.plot(x, Gauss(x, *popt), 'r-', label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()

I personally prefer using numpy.

Comment on the definition of the mean (including Developer's answer)

Since the reviewers did not like my edit on #Developer's code, I will explain for what case I would suggest an improved code. The mean of developer does not correspond to one of the normal definitions of the mean.

Your definition returns:

>>> sum(x * y)
125

Developer's definition returns:

>>> sum(x * y) / len(x)
12.5 #for Python 3.x

The weighted arithmetic mean:

>>> sum(x * y) / sum(y)
5.0

Similarly you can compare the definitions of standard deviation (sigma). Compare with the figure of the resulting fit:

Comment for Python 2.x users

In Python 2.x you should additionally use the new division to not run into weird results or convert the the numbers before the division explicitly:

from __future__ import division

or e.g.

sum(x * y) * 1. / sum(y)

I like the "weighted arithmetic mean". It makes the estimation of mean and std more accurate in some cases (for example, if you have a long left tail and short right tail). However, when I tried to google a theoretical explanation why "weighted arithmetic mean" is better, I found nothing. this is weird... — halfmoonhalf, Mar 16 '23 at 05:55
Did you see the link to Wikipedia? - You can also look up the definition of the expectation value - the value, which is most likely to appear again from the existing data. — strpeter, Apr 14 '23 at 08:03

score 6 · Answer 3 · answered Mar 11 '14 at 13:02

6

You get a horizontal straight line because it did not converge.

Better convergence is attained if the first parameter of the fitting (p0) is put as max(y), 5 in the example, instead of 1.

answered Mar 11 '14 at 13:02

user3406246

69
1
1

score 4 · Answer 4 · edited Dec 10 '19 at 09:13

4

After losing hours trying to find my error, the problem is your formula:

sigma = sum(y*(x-mean)**2)/n

This previous formula is wrong, the correct formula is the square root of this!;

sqrt(sum(y*(x-mean)**2)/n)

Hope this helps

edited Dec 10 '19 at 09:13

asendjasni

963
1
16
36

answered May 10 '14 at 17:10

James

41
1

2

There is still another error: It is not 1/n but 1/sum(y). Check out [my answer below](http://stackoverflow.com/a/38431524/2062965). – strpeter Jul 20 '16 at 08:02

score 1 · Answer 5 · answered Nov 25 '15 at 23:35

There is another way of performing the fit, which is by using the 'lmfit' package. It basically uses the cuve_fit but is much better in fitting and offers complex fitting as well. Detailed step by step instructions are given in the below link. http://cars9.uchicago.edu/software/python/lmfit/model.html#model.best_fit

score 1 · Answer 6 · edited Jan 09 '16 at 11:52

1

sigma = sum(y*(x - mean)**2)

should be

sigma = np.sqrt(sum(y*(x - mean)**2))

edited Jan 09 '16 at 11:52

Cleb

25,102
20
116
151

answered Jan 09 '16 at 10:57

gastro

261
1
7

score 1 · Answer 7 · answered Feb 05 '21 at 11:43

Actually, you do not need to do a first guess. Simply doing

import matplotlib.pyplot as plt  
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp

x = ar(range(10))
y = ar([0,1,2,3,4,5,4,3,2,1])

n = len(x)                          #the number of data
mean = sum(x*y)/n                   #note this correction
sigma = sum(y*(x-mean)**2)/n        #note this correction

def gaus(x,a,x0,sigma):
    return a*exp(-(x-x0)**2/(2*sigma**2))

popt,pcov = curve_fit(gaus,x,y)
#popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma])

plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()

works fine. This is simpler because making a guess is not trivial. I had more complex data and did not manage to do a proper first guess, but simply removing the first guess worked fine :)

P.S.: use numpy.exp() better, says a warning of scipy

Gaussian fit for Python

7 Answers7

Explanation

Example

Comment on the definition of the mean (including Developer's answer)

Comment for Python 2.x users

Linked