Fit a gaussian function

Question

I have a histogram (see below) and I am trying to find the mean and standard deviation along with code which fits a curve to my histogram. I think there is something in SciPy or matplotlib that can help, but every example I've tried doesn't work.

import matplotlib.pyplot as plt
import numpy as np

with open('gau_b_g_s.csv') as f:
    v = np.loadtxt(f, delimiter= ',', dtype="float", skiprows=1, usecols=None)

fig, ax = plt.subplots()

plt.hist(v, bins=500, color='#7F38EC', histtype='step')

plt.title("Gaussian")
plt.axis([-1, 2, 0, 20000])

plt.show()

What do you mean by doesn't work? It doesn't run, or the output isn't correct? — Jodaka, Jul 16 '12 at 15:03
I can't get the codes from the internet to run, to actually make a curve like they are supposed to — , Jul 16 '12 at 15:08
which is most likely happening because I just started programming and I generally have no idea what i'm doing — , Jul 16 '12 at 15:11
So are you getting an error message when you try to run it? Or does the program complete without producing anything? — Jodaka, Jul 16 '12 at 15:14

score 46 · Accepted Answer · edited May 23 '17 at 11:47

46

Take a look at this answer for fitting arbitrary curves to data. Basically you can use scipy.optimize.curve_fit to fit any function you want to your data. The code below shows how you can fit a Gaussian to some random data (credit to this SciPy-User mailing list post).

import numpy
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# Define some test data which is close to Gaussian
data = numpy.random.normal(size=10000)

hist, bin_edges = numpy.histogram(data, density=True)
bin_centres = (bin_edges[:-1] + bin_edges[1:])/2

# Define model function to be used to fit to the data above:
def gauss(x, *p):
    A, mu, sigma = p
    return A*numpy.exp(-(x-mu)**2/(2.*sigma**2))

# p0 is the initial guess for the fitting coefficients (A, mu and sigma above)
p0 = [1., 0., 1.]

coeff, var_matrix = curve_fit(gauss, bin_centres, hist, p0=p0)

# Get the fitted curve
hist_fit = gauss(bin_centres, *coeff)

plt.plot(bin_centres, hist, label='Test data')
plt.plot(bin_centres, hist_fit, label='Fitted data')

# Finally, lets get the fitting parameters, i.e. the mean and standard deviation:
print 'Fitted mean = ', coeff[1]
print 'Fitted standard deviation = ', coeff[2]

plt.show()

edited May 23 '17 at 11:47

Community

1
1

answered Jul 16 '12 at 15:42

Chris

44,602
16
137
156

thanks, this got the mean and sd well, but the curve fit doesn't actually produce a curve, it produces lines – Jul 16 '12 at 16:18
1

Do you mean my example just produces lines? Or when you apply the above code to your data you get lines? Also, what is the difference between a line and a curve? – Chris Jul 16 '12 at 16:39
as opposed to bell curve type shape, it just looks like a carrot ^ – Jul 16 '12 at 17:47
Without more information I can't really help you. Do you mean it looks like a carrot with your data? If so, then presumably it is because that is what your data looks like. When asking questions it is best to include a [short, self contained example](http://sscce.org/). – Chris Jul 17 '12 at 08:57
3

I suspect @user1496646 means that, in his case, there aren't that many , so when you plot(bin_centres, hist_fit), it comes out poorly sampled Gaussian ("carrot"). He should just subsample the bin_centers, using new_bin_centers = numpy.linspace(bin_centres[0], bin_centres[-1], 200), new_hist_fit = gauss(new_bin_centres, *coeff), and plot(new_bin_centres, new_hist_fit) – SuperElectric Feb 24 '13 at 23:08

score 16 · Answer 2 · edited Jul 04 '15 at 15:49

16

You can try sklearn gaussian mixture model estimation as below :

import numpy as np
import sklearn.mixture

gmm = sklearn.mixture.GMM()

# sample data
a = np.random.randn(1000)

# result
r = gmm.fit(a[:, np.newaxis]) # GMM requires 2D data as of sklearn version 0.16
print("mean : %f, var : %f" % (r.means_[0, 0], r.covars_[0, 0]))

Reference : http://scikit-learn.org/stable/modules/mixture.html#mixture

Note that in this way, you don't need to estimate your sample distribution with an histogram.

edited Jul 04 '15 at 15:49

1''

26,823
32
143
200

answered Jul 16 '12 at 15:35

Nicolas Barbey

6,639
4
28
34

`sklearn.mixture.GMM` seems to have been replaced with [`sklearn.mixture.GaussianMixture`](https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture) at some point. – jjc385 Dec 04 '20 at 11:43

score 3 · Answer 3 · answered Sep 28 '15 at 16:37

3

Kind of an old question, but for anybody looking just to plot a density fit for a series, you could try matplotlib's .plot(kind='kde'). Docs here.

Example with pandas:

mydf.x.plot(kind='kde')

answered Sep 28 '15 at 16:37

misterte

977
1
11
21

wow, TIL matplotlib has kernel density estimation built in. +1 – Joseph Farah Jun 03 '18 at 22:24

Akavall · Answer 4 · 2012-07-16T16:00:51.730

1

I am not sure what your input is, but possibly your y-axis scale is too large (20000), try reducing this number. The following code works for me:

import matplotlib.pyplot as plt
import numpy as np

#created my variable
v = np.random.normal(0,1,1000)


fig, ax = plt.subplots()


plt.hist(v, bins=500, normed=1, color='#7F38EC', histtype='step')

#plot
plt.title("Gaussian")
plt.axis([-1, 2, 0, 1]) #changed 20000 to 1

plt.show()

Edit:

If you want the actual count of values on y-axis, you can set normed=0. And would just get rid of the plt.axis([-1, 2, 0, 1]).

import matplotlib.pyplot as plt
import numpy as np

#function
v = np.random.normal(0,1,500000)


fig, ax = plt.subplots()

# changed normed=1 to normed=0
plt.hist(v, bins=500, normed=0, color='#7F38EC', histtype='step')

#plot
plt.title("Gaussian")
#plt.axis([-1, 2, 0, 20000]) 

plt.show()

edited Jul 16 '12 at 16:00

answered Jul 16 '12 at 15:29

Akavall

82,592
51
207
251

no im working with over half a million points so I want the scale to be that big because I don't want like 50,000 bins – Jul 16 '12 at 15:31
@I believe the value on y-axis does not tell you the number of observations in each bin, it tells you percentage in each bin. Just comment out the whole `plt.axis([-1, 2, 0, 1])` line and run it, you should get a distribution plot. – Akavall Jul 16 '12 at 15:39
its definitely telling me the number in each bin because i can see the histogram itself with the y axis at 20,000 – Jul 16 '12 at 21:22
Downvoter, can you please explain the reason for the downvote? – Akavall Jul 11 '13 at 13:06

Fit a gaussian function

4 Answers4

Linked