12

When using curve_fit from scipy.optimize to fit a some data in python, one first defines the fitting function (e.g. a 2nd order polynomial) as follows:

  1. def f(x, a, b): return a*x**2+b*x
  2. And then proceeds with the fitting popt, pcov = curve_fit(f,x,y)

But the question is now, how does one go about defining the function in point 1. if the function contains an integral (or a discrete sum), e.g.:

enter image description here

The experimental data is still given for x and f(x), so point 2. would be similar I imagine once I can define f(x) in python. By the way I forgot to say that it is assumed that g(t) has a well known form here, and contains the fitting parameters, i.e. parameters like a and b given in the polynomial example. Any help is much appreciated. The question is really supposed to be a generic one, and the functions used in the post are just random examples.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • The obvious answer is: you need a way to evaluate that integral, either by finding a closed-form solution or by using numerical quadrature. There is no generic solution to this. – cfh May 19 '15 at 17:18
  • @cfh oh I see, it's true, but if it doesn't have any closed-form solution, what the numerical quadrature exactly entail? doesn't it assume that all parameters should be know then? –  May 19 '15 at 17:21
  • Yes, but at the time that `f` is called, you know all the parameters since they are passed as arguments. – cfh May 19 '15 at 17:25
  • Isn't it exactly the same in the simple polynomial example you showed? There are two parameters, `a` and `b`, which you are trying to fit, yet you use them in the formula `a*x**2+b*x`. – cfh May 19 '15 at 17:28
  • @cfh Sure but in that example I didn't have to do a numerical integration, so I didn't have to know a and b before the fitting. But following your suggestion of "first evaluating the integral" I would have to know a and b before the fitting (to do the integral numerically) which I don't... –  May 19 '15 at 17:31
  • No. When the `curve_fit` function calls your `f`, it will always provide specific values for `a` and `b`. You can use those to evaluate a polynomial, compute an integral, do whatever you want. – cfh May 19 '15 at 17:32
  • @cfh ohhhh now I see how you meant it, right right! Would you mind (or if happen to have the time) to show a simple example for this (i.e. a numerical integration involved)? –  May 19 '15 at 17:35
  • It really depends a lot on the function you are trying to integrate. Some functions for doing this are contained in [scipy.integrate](http://docs.scipy.org/doc/scipy/reference/tutorial/integrate.html). – cfh May 19 '15 at 17:39

2 Answers2

7

Here's an example of fitting a curve defined in terms of an integral. The curve is the integral of sin(t*w)/t+p over t from 0 to Pi. Our x data points correspond to w, and we're adjusting the p parameter to to get the data to fit.

import math, numpy, scipy.optimize, scipy.integrate

def integrand(t, args):
    w, p = args
    return math.sin(t * w)/t + p

def curve(w, p):
    res = scipy.integrate.quad(integrand, 0.0, math.pi, [w, p])
    return res[0]

vcurve = numpy.vectorize(curve, excluded=set([1]))

truexdata = numpy.asarray([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0])
trueydata = vcurve(truexdata, 1.0)

xdata = truexdata + 0.1 * numpy.random.randn(8)
ydata = trueydata + 0.1 * numpy.random.randn(8)

popt, pcov = scipy.optimize.curve_fit(vcurve,
                                      xdata, ydata,
                                      p0=[2.0])
print popt

That'll print something out fairly close to 1.0, which is what we used as p when we created the trueydata.

Note that we use numpy.vectorize on the curve function to produce a vectorized version compatible with scipy.optimize.curve_fit.

Jay Kominek
  • 8,674
  • 1
  • 34
  • 51
  • 2
    this is interesting, useful and I think generic enough, but can we do a little better knowing that this is integral? In your solution, `curve` could be anything and does not benefit from the fact that it is integral. Maybe question should be changed to `fitting data with non-vectorized functions`? – dashesy May 23 '15 at 20:46
3

Sometimes you can be lucky and you're able to evaluate the integral analytically. In the following example the product of h(t)=exp(-(t-x)**2/2) and a second degree polynomial g(t) is integrated from 0 to infinity. Sympy is used to evaluate the Integral and generate a function usable for curve_fit():

import sympy as sy
sy.init_printing()  # LaTeX-like pretty printing of IPython


t, x = sy.symbols("t, x", real=True)

h = sy.exp(-(t-x)**2/2)

a0, a1, a2 = sy.symbols('a:3', real=True)  # unknown coefficients
g = a0 + a1*t + a2*t**2

gh = (g*h).simplify()  # the intgrand
G = sy.integrate(gh, (t, 0, sy.oo)).simplify()  # integrate from 0 to infinty

# Generate numeric function to be usable by curve_fit()
G_opt = sy.lambdify((x, t, a0, a1, a2), G)

print(G_opt(1, 2, 3, 4, 5))  # example usage

Note that in general the problem is often ill-posed since the integral does not neccesarily converge in a large enough neighborhood of the solution (which is assumed by curve_fit()).

Dietrich
  • 5,241
  • 3
  • 24
  • 36