4

I want to fit a function with vector output using Scipy's curve_fit (or something more appropriate if available). For example, consider the following function:

import numpy as np
def fmodel(x, a, b):
    return np.vstack([a*np.sin(b*x), a*x**2 - b*x, a*np.exp(b/x)])

Each component is a different function but they share the parameters I wish to fit. Ideally, I would do something like this:

x = np.linspace(1, 20, 50)
a = 0.1
b = 0.5
y = fmodel(x, a, b)
y_noisy = y + 0.2 * np.random.normal(size=y.shape)

from scipy.optimize import curve_fit
popt, pcov = curve_fit(f=fmodel, xdata=x, ydata=y_noisy, p0=[0.3, 0.1])

But curve_fit does not work with functions with vector output, and an error Result from function call is not a proper array of floats. is thrown. What I did instead is to flatten out the output like this:

def fmodel_flat(x, a, b):
    return fmodel(x[0:len(x)/3], a, b).flatten()

popt, pcov = curve_fit(f=fmodel_flat, xdata=np.tile(x, 3),
                       ydata=y_noisy.flatten(), p0=[0.3, 0.1])

and this works. If instead of a vector function I am actually fitting several functions with different inputs as well but which share model parameters, I can concatenate both input and output.

Is there a more appropriate way to fit vector function with Scipy or perhaps some additional module? A main consideration for me is efficiency - the actual functions to fit are much more complex and fitting can take some time, so if this use of curve_fit is mangled and is leading to excessive runtimes I would like to know what I should do instead.

buzjwa
  • 2,632
  • 2
  • 24
  • 37
  • 1
    You might be interested in [lmfit](https://lmfit.github.io/lmfit-py/). They also suggest the `flatten` method for multidimensional data. – chthonicdaemon Nov 27 '16 at 16:56
  • `scipy.optimize.least_squares` makes you specify a residuals function. You can put the flattening (or something more nuanced) there. – Nathan Lloyd Jul 22 '20 at 17:20

2 Answers2

2

If I can be so blunt as to recommend my own package symfit, I think it does precisely what you need. An example on fitting with shared parameters can be found in the docs.

Your specific problem stated above would become:

from symfit import variables, parameters, Model, Fit, sin, exp

x, y_1, y_2, y_3 = variables('x, y_1, y_2, y_3')
a, b = parameters('a, b')
a.value = 0.3
b.value = 0.1

model = Model({
    y_1: a * sin(b * x), 
    y_2: a * x**2 - b * x, 
    y_3: a * exp(b / x),
})

xdata = np.linspace(1, 20, 50)
ydata = model(x=xdata, a=0.1, b=0.5)
y_noisy = ydata + 0.2 * np.random.normal(size=(len(model), len(xdata)))

fit = Fit(model, x=xdata, y_1=y_noisy[0], y_2=y_noisy[1], y_3=y_noisy[2])
fit_result = fit.execute()

Check out the docs for more!

tBuLi
  • 2,295
  • 2
  • 16
  • 16
1

I think what you're doing is perfectly fine from an efficiency stand point. I'll try to look at the implementation and come up with something more quantitative, but for the time being here is my reasoning.

What you're doing during curve fitting is optimizing the parameters (a,b) such that

res = sum_i |f(x_i; a,b)-y_i|^2

is minimal. By this I mean that you have data points (x_i,y_i) of arbitrary dimensionality, two parameters (a,b) and a fitting model that approximates the data at query points x_i.

The curve fitting algorithm starts from a starting (a,b) pair, puts this into a black box that computes the above square error, and tries to come up with a new (a',b') pair that produces a smaller error. My point is that the error above is really a black box for the fitting algorithm: the configurational space of the fitting is defined merely by the (a,b) parameters. If you imagine how you'd implement a simple curve fitting function, you could imagine that you try to do, say, a gradient descent, with the square error as cost function.

Now, it should be irrelevant to the fitting procedure how the black box computes the error. It's easy to see that the dimensionality of x_i is really irrelevant for scalar functions, since it doesn't matter if you have 1000 1d query points to fit for, or a 10x10x10 grid in 3d space. What matters is that you have 1000 points x_i for which you need to compute f(x_i) ~ y_i from the model.

The only subtlety that should further be noted is that in case of a vector-valued function, the calculation of the error is not trivial. In my opinion, it's fine to define the error at each x_i point using the 2-norm of the vector-valued function. But hey: in this case, the square error at point x_i is

|f(x_i; a,b)-y_i|^2 == sum_k (f(x_i; a,b)[k]-y_i[k])^2

which implies that the square error for each component is accumulated. This just means that what you're doing right now is just right: by replicating your x_i points and taking into account each component of the function individually, your square error will contain exactly the 2-norm of the error at each point.

So my point is what you're doing is mathematically correct, and I don't expect any behaviour of the fitting procedure to depend on the way how multivariate/vector-valued functions are handled.