I'm trying to fit the distribution of some experimental values with a custom probability density function. Obviously, the integral of the resulting function should always be equal to 1, but the results of simple scipy.optimize.curve_fit(function, dataBincenters, dataCounts) never satisfy this condition. What is the best way to solve this problem?
5 Answers
You can define your own residuals function, including a penalization parameter, like detailed in the code below, where it is known beforehand that the integral along the interval must be 2.
. If you test without the penalization you will see that what your are getting is the conventional curve_fit
:
import matplotlib.pyplot as plt
import scipy
from scipy.optimize import curve_fit, minimize, leastsq
from scipy.integrate import quad
from scipy import pi, sin
x = scipy.linspace(0, pi, 100)
y = scipy.sin(x) + (0. + scipy.rand(len(x))*0.4)
def func1(x, a0, a1, a2, a3):
return a0 + a1*x + a2*x**2 + a3*x**3
# here you include the penalization factor
def residuals(p, x, y):
integral = quad(func1, 0, pi, args=(p[0], p[1], p[2], p[3]))[0]
penalization = abs(2.-integral)*10000
return y - func1(x, p[0], p[1], p[2], p[3]) - penalization
popt1, pcov1 = curve_fit(func1, x, y)
popt2, pcov2 = leastsq(func=residuals, x0=(1., 1., 1., 1.), args=(x, y))
y_fit1 = func1(x, *popt1)
y_fit2 = func1(x, *popt2)
plt.scatter(x, y, marker='.')
plt.plot(x, y_fit1, color='g', label='curve_fit')
plt.plot(x, y_fit2, color='y', label='constrained')
plt.legend()
plt.xlim(-0.1, 3.5)
plt.ylim(0, 1.4)
print('Exact integral:', quad(sin, 0, pi)[0])
print('Approx integral1:', quad(func1, 0, pi, args=(popt1[0], popt1[1], popt1[2], popt1[3]))[0])
print('Approx integral2:', quad(func1, 0, pi, args=(popt2[0], popt2[1], popt2[2], popt2[3]))[0])
plt.show()
#Exact integral: 2.0
#Approx integral1: 2.60068579748
#Approx integral2: 2.00001911981
Other related questions:

- 56,802
- 26
- 179
- 234
-
@Axon Which warning? It would be nice if you could paste your code somewhere on the web... – Saullo G. P. Castro May 20 '13 at 09:31
-
I tried this method, but in this case I get this warning: http://dpaste.org/NfLwy/, and the resulting fitted curve doesn't even nearly resemble the distribution. [scipy.optimize.curve_fit](http://storage8.static.itmages.ru/i/13/0520/h_1369043990_1772962_4ff4c2e582.png) [with penalization](http://storage1.static.itmages.ru/i/13/0520/h_1369044034_5072385_fce21d00cc.png) – Axon May 20 '13 at 10:01
-
@Axon This is an integration error. I'am checking here, but you can try another penalization factor `10000` and see what happens. You can also change the initial guess=(1.,1.,1.,1.) to another attempt – Saullo G. P. Castro May 20 '13 at 10:10
-
@Axon [This answer](http://stackoverflow.com/a/16651524/832621) or [this answer](http://stackoverflow.com/a/16651955/832621) may give you some insight about how to fit a distribution function – Saullo G. P. Castro May 20 '13 at 14:43
-
Is there any reason why not to use `popt1, pcov1 = curve_fit( func1withpenalization, x, y )`, where the definition of `func1withpenalization` includes the penalization, e.g,: `def func1withpenalization(p,x,y): integral = quad( func1, 0, pi, args=(p[0],p[1],p[2],p[3]))[0]; penalization = abs(2.-integral)*10000; return y - func1(x, p[0],p[1],p[2],p[3]) - penalization` – altroware Jan 12 '15 at 10:53
-
1@altroware no special reason, but since `curve_fit` is a Python wrapper around `leastsq` I preferred to use the latter... but It would nice to have a new answer with `curve_fit` ;) – Saullo G. P. Castro Jan 12 '15 at 11:13
Here is an almost-identical snippet which makes only use of curve_fit
.
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as opt
import scipy.integrate as integr
x = np.linspace(0, np.pi, 100)
y = np.sin(x) + (0. + np.random.rand(len(x))*0.4)
def Func(x, a0, a1, a2, a3):
return a0 + a1*x + a2*x**2 + a3*x**3
# modified function definition with Penalization
def FuncPen(x, a0, a1, a2, a3):
integral = integr.quad( Func, 0, np.pi, args=(a0,a1,a2,a3))[0]
penalization = abs(2.-integral)*10000
return a0 + a1*x + a2*x**2 + a3*x**3 + penalization
popt1, pcov1 = opt.curve_fit( Func, x, y )
popt2, pcov2 = opt.curve_fit( FuncPen, x, y )
y_fit1 = Func(x, *popt1)
y_fit2 = Func(x, *popt2)
plt.scatter(x,y, marker='.')
plt.plot(x,y_fit2, color='y', label='constrained')
plt.plot(x,y_fit1, color='g', label='curve_fit')
plt.legend(); plt.xlim(-0.1,3.5); plt.ylim(0,1.4)
print 'Exact integral:',integr.quad(np.sin ,0,np.pi)[0]
print 'Approx integral1:',integr.quad(Func,0,np.pi,args=(popt1[0],popt1[1],
popt1[2],popt1[3]))[0]
print 'Approx integral2:',integr.quad(Func,0,np.pi,args=(popt2[0],popt2[1],
popt2[2],popt2[3]))[0]
plt.show()
#Exact integral: 2.0
#Approx integral1: 2.66485028754
#Approx integral2: 2.00002116217
Following the example above here is more general way to add any constraints:
from scipy.optimize import minimize
from scipy.integrate import quad
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, np.pi, 100)
y = np.sin(x) + (0. + np.random.rand(len(x))*0.4)
def func_to_fit(x, params):
return params[0] + params[1] * x + params[2] * x ** 2 + params[3] * x ** 3
def constr_fun(params):
intgrl, _ = quad(func_to_fit, 0, np.pi, args=(params,))
return intgrl - 2
def func_to_minimise(params, x, y):
y_pred = func_to_fit(x, params)
return np.sum((y_pred - y) ** 2)
# Do the parameter fitting
#without constraints
res1 = minimize(func_to_minimise, x0=np.random.rand(4), args=(x, y))
params1 = res1.x
# with constraints
cons = {'type': 'eq', 'fun': constr_fun}
res2 = minimize(func_to_minimise, x0=np.random.rand(4), args=(x, y), constraints=cons)
params2 = res2.x
y_fit1 = func_to_fit(x, params1)
y_fit2 = func_to_fit(x, params2)
plt.scatter(x,y, marker='.')
plt.plot(x, y_fit2, color='y', label='constrained')
plt.plot(x, y_fit1, color='g', label='curve_fit')
plt.legend(); plt.xlim(-0.1,3.5); plt.ylim(0,1.4)
plt.show()
print(f"Constrant violation: {constr_fun(params1)}")
Constraint violation: -2.9179325622408214e-10

- 51
- 4
If you are able normalise your probability fitting function in advance then you can use this information to constrain your fit. A very simple example of this would be fitting a Gaussian to data. If one were to fit the following three-parameter (A, mu, sigma) Gaussian then it would be unnormalised in general:
however, if one instead enforces the normalisation condition on A:
then the Gaussian is only two parameter and is automatically normalised.

- 383
- 4
- 19
You could ensure that your fitted probability distribution is normalised via a numerical integration. For example, assuming that you have data x
and y
and that you have defined an unnormalised_function(x, a, b)
with parameters a
and b
for your probability distribution, which is defined on the interval x1
to x2
(which could be infinite):
from scipy.optimize import curve_fit
from scipy.integrate import quad
# Define a numerically normalised function
def normalised_function(x, a, b):
normalisation, _ = quad(lambda x: unnormalised_function(x, a, b), x1, x2)
return unnormalised_function(x, a, b)/normalisation
# Do the parameter fitting
fitted_parameters, _ = curve_fit(normalised_function, x, y)

- 383
- 4
- 19