3

I am trying to fit a skewed and shifted Gaussian curve using scipy's curve_fit function, but I find that under certain conditions the fitting is quite poor, often giving me close to or exactly a straight line.

The code below is derived from the curve_fit documentation. The code provided is an arbitrary set of data for test purposes but displays the issue quite well.

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import math as math
import scipy.special as sp

#def func(x, a, b, c):
#    return a*np.exp(-b*x) + c

def func(x, sigmag, mu, alpha, c,a):
    #normal distribution
    normpdf = (1/(sigmag*np.sqrt(2*math.pi)))*np.exp(-(np.power((x-mu),2)/(2*np.power(sigmag,2))))
    normcdf = (0.5*(1+sp.erf((alpha*((x-mu)/sigmag))/(np.sqrt(2)))))
    return 2*a*normpdf*normcdf + c

x = np.linspace(0,100,100)
y = func(x, 10,30, 0,0,1)
yn = y + 0.001*np.random.normal(size=len(x))

popt, pcov = curve_fit(func, x, yn,) #p0=(9,35,0,9,1))

y_fit= func(x,popt[0],popt[1],popt[2],popt[3],popt[4])

plt.plot(x,yn)
plt.plot(x,y_fit)

The issue seems to pop up when I shift the gaussian too far from zero (using mu). I have tried giving initial values, even those identical to my original function, but it does not solve the problem. For a value of mu=10, curve_fit works perfectly, but if I use mu>=30 it not longer fits the data.

Gabriel
  • 40,504
  • 73
  • 230
  • 404
abradd
  • 275
  • 2
  • 4
  • 8

2 Answers2

3

You can call curve_fit many times with random initial guess, and choose the parameters with minimum error.

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import math as math
import scipy.special as sp

def func(x, sigmag, mu, alpha, c,a):
    #normal distribution
    normpdf = (1/(sigmag*np.sqrt(2*math.pi)))*np.exp(-(np.power((x-mu),2)/(2*np.power(sigmag,2))))
    normcdf = (0.5*(1+sp.erf((alpha*((x-mu)/sigmag))/(np.sqrt(2)))))
    return 2*a*normpdf*normcdf + c

x = np.linspace(0,100,100)
y = func(x, 10,30, 0,0,1)
yn = y + 0.001*np.random.normal(size=len(x))

results = []
for i in xrange(50):
    p = np.random.randn(5)*10
    try:
        popt, pcov = curve_fit(func, x, yn, p)
    except:
        pass
    err = np.sum(np.abs(func(x, *popt) - yn))
    results.append((err, popt))
    if err < 0.1:
        break

err, popt = min(results, key=lambda x:x[0])
y_fit= func(x, *popt)

plt.plot(x,yn)
plt.plot(x,y_fit)
print len(results)
HYRY
  • 94,853
  • 25
  • 187
  • 187
3

Giving starting points for minimization often works wonders. Try giving the minimizer some information on the position of the maximum and the width of the curve:

popt, pcov = curve_fit(func, x, yn, p0=(1./np.std(yn), np.argmax(yn) ,0,0,1))

Changing this single line in your code with sigma=10 and mu=50 produces enter image description here

ev-br
  • 24,968
  • 9
  • 65
  • 78
  • Alright, things are starting to look a little nicer now. I think I overestimated the curve_fit algorithm a little and expected too much of it. I am ball parking the values manually now and then plugging them into curve_fit and getting nice results. Cheers. – abradd Mar 14 '13 at 22:58