0

I am trying to fit a gaussian curve to my data which is a list of density variations with height, however the plot of the fitted curve generated is always off (peak doesn't align, width is overestimated). Here is my code:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit

#Gaussian function
def gauss_function(x, a, x0, sigma):
    return a*np.exp(-(x-x0)**2/float((2*sigma**2)))

x = heights5
y = demeans5 #density values at each height

amp = max(y)
center = x[np.argmax(y)]
width = 20 #eye-balled estimate


#p0 = amp, width, center
popt, pcov = curve_fit(gauss_function, x, y, p0 = [amp, width, center])

#plot
dataplot = plt.scatter(x, y, marker = '.', label = 'Observations')
gausplot = plt.plot(x,gauss_function(x, *popt), color='red', label ='Gaussian fit')

string = 'fwhm = ' + str(2.355*popt[2]) + '\npeak = ' + str(popt[0]) + '\nmean = ' + str(popt[1]) + '\nsigma = ' + str(popt[2])

#plot labels etc.
plt.xlabel("Height[km]")
plt.ylabel("Density")
plt.legend([dataplot, gausplot], labels = ['fit', 'Observations'])
plt.text(130, 2000, string)
plt.show()

This is the plot it generates:

poor fit

How do I fit the curve more accurately? And also, is there a way to estimate the width with the data?

Bill Bell
  • 21,021
  • 5
  • 43
  • 58
  • 3
    What do you expect? Your data is not normal-distributed (and even has 2 peaks; is bimodal). That fitting does not look very wrong to me given your assumption about the distribution. Of course you could use a Mixture of Gaussians, which sounds right here, but i'm not sure if that's what you want. – sascha Jun 07 '17 at 18:21
  • You seem to have plenty of data points, which implies that they were not actually sampled from a Gaussian population. – Bill Bell Jun 07 '17 at 18:21
  • There's a hint that a mixture of three distributions is involved. Have you theoretical grounds for the Gaussian assumption? – Bill Bell Jun 07 '17 at 18:24
  • So what do I do if I just want the gaussian to be fitted to the first or highest peak and ignore the second peak/any anomalies? I also want the width parameters to be fitted more accurately to the width. I do have theoretical grounds for the gaussian assumption, and the secondary peak can be due to some noise in my observation for this particular data set. – Gandalf_3.1 Jun 07 '17 at 18:30
  • Alright, then your best readily available data for estimating that Gaussian would appear to be the data to the left of about 92 km. I would attempt to fit to that. – Bill Bell Jun 07 '17 at 18:51
  • Ok, so I did that by limiting the x and y to only those values before the second peak by doing something like : x = heights5[heights5 – Gandalf_3.1 Jun 07 '17 at 19:18
  • you could try fitting 2 or 3 gaussians to your data, need care with guess values: https://stackoverflow.com/questions/26902283/fit-multiple-gaussians-to-the-data-in-python – f5r5e5d Jun 07 '17 at 20:28
  • oh, perfect. Thanks! – Gandalf_3.1 Jun 07 '17 at 20:51
  • I just returned here to see if anything was happening. Gandalf, on SO, if you want to reply to someone you need to use the '@' character in a comment. As soon as you type the '@' the software will offer you a menu of recipients. – Bill Bell Jun 08 '17 at 17:42
  • 'What I would rather do ...': That's a different problem! :) I thought we had already discussed that. – Bill Bell Jun 08 '17 at 17:46
  • @BillBell I am instead going to look into fitting multiple gaussians to the dataset. Thank you for your help! I am new to this sort of data analysis with python. – Gandalf_3.1 Jun 08 '17 at 22:46
  • No worries! Good luck with that. – Bill Bell Jun 09 '17 at 15:50

2 Answers2

0

For a more accurate fit, you could look into scipy.interpolate module. The functions there do a good job with interpolating and fitting.

Other fitting techniques which could do a good job are: a) CSTs b) BSplines c) Polynomial interpolation

Scipy also has an implementation for BSplines. The other two, you may have to implement yourself.

newkid
  • 1,368
  • 1
  • 11
  • 27
0

A very similar question about using Python to fit double peaks is answered here:

How to guess the actual lorentzian function without relaxation behavior with Least square curve fitting

James Phillips
  • 4,526
  • 3
  • 13
  • 11