Gaussian does not fit correctly to data

Question

I am trying to fit a gaussian curve to my data which is a list of density variations with height, however the plot of the fitted curve generated is always off (peak doesn't align, width is overestimated). Here is my code:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit

#Gaussian function
def gauss_function(x, a, x0, sigma):
    return a*np.exp(-(x-x0)**2/float((2*sigma**2)))

x = heights5
y = demeans5 #density values at each height

amp = max(y)
center = x[np.argmax(y)]
width = 20 #eye-balled estimate


#p0 = amp, width, center
popt, pcov = curve_fit(gauss_function, x, y, p0 = [amp, width, center])

#plot
dataplot = plt.scatter(x, y, marker = '.', label = 'Observations')
gausplot = plt.plot(x,gauss_function(x, *popt), color='red', label ='Gaussian fit')

string = 'fwhm = ' + str(2.355*popt[2]) + '\npeak = ' + str(popt[0]) + '\nmean = ' + str(popt[1]) + '\nsigma = ' + str(popt[2])

#plot labels etc.
plt.xlabel("Height[km]")
plt.ylabel("Density")
plt.legend([dataplot, gausplot], labels = ['fit', 'Observations'])
plt.text(130, 2000, string)
plt.show()

This is the plot it generates:

How do I fit the curve more accurately? And also, is there a way to estimate the width with the data?

What do you expect? Your data is not normal-distributed (and even has 2 peaks; is bimodal). That fitting does not look very wrong to me given your assumption about the distribution. Of course you could use a Mixture of Gaussians, which sounds right here, but i'm not sure if that's what you want. — sascha, Jun 07 '17 at 18:21
You seem to have plenty of data points, which implies that they were not actually sampled from a Gaussian population. — Bill Bell, Jun 07 '17 at 18:21
There's a hint that a mixture of three distributions is involved. Have you theoretical grounds for the Gaussian assumption? — Bill Bell, Jun 07 '17 at 18:24
So what do I do if I just want the gaussian to be fitted to the first or highest peak and ignore the second peak/any anomalies? I also want the width parameters to be fitted more accurately to the width. I do have theoretical grounds for the gaussian assumption, and the secondary peak can be due to some noise in my observation for this particular data set. — Gandalf_3.1, Jun 07 '17 at 18:30
Alright, then your best readily available data for estimating that Gaussian would appear to be the data to the left of about 92 km. I would attempt to fit to that. — Bill Bell, Jun 07 '17 at 18:51
Ok, so I did that by limiting the x and y to only those values before the second peak by doing something like : x = heights5[heights5 — Gandalf_3.1, Jun 07 '17 at 19:18
you could try fitting 2 or 3 gaussians to your data, need care with guess values: https://stackoverflow.com/questions/26902283/fit-multiple-gaussians-to-the-data-in-python — f5r5e5d, Jun 07 '17 at 20:28
I just returned here to see if anything was happening. Gandalf, on SO, if you want to reply to someone you need to use the '@' character in a comment. As soon as you type the '@' the software will offer you a menu of recipients. — Bill Bell, Jun 08 '17 at 17:42
'What I would rather do ...': That's a different problem! :) I thought we had already discussed that. — Bill Bell, Jun 08 '17 at 17:46
@BillBell I am instead going to look into fitting multiple gaussians to the dataset. Thank you for your help! I am new to this sort of data analysis with python. — Gandalf_3.1, Jun 08 '17 at 22:46

score 0 · Accepted Answer · answered Jun 07 '17 at 20:10

For a more accurate fit, you could look into scipy.interpolate module. The functions there do a good job with interpolating and fitting.

Other fitting techniques which could do a good job are: a) CSTs b) BSplines c) Polynomial interpolation

Scipy also has an implementation for BSplines. The other two, you may have to implement yourself.

score 0 · Answer 2 · answered Jun 11 '17 at 12:05

0

A very similar question about using Python to fit double peaks is answered here:

How to guess the actual lorentzian function without relaxation behavior with Least square curve fitting

answered Jun 11 '17 at 12:05

James Phillips

4,526
3
13
11

Gaussian does not fit correctly to data

2 Answers2