20

I have 4 non-linear equations with three unknowns X, Y, and Z that I want to solve for. The equations are of the form:

F(m) = X^2 + a(m)Y^2 + b(m)XYcosZ + c(m)XYsinZ

...where a, b and c are constants which are dependent on each value of F in the four equations.

What is the best way to go about solving this?

Michael0x2a
  • 58,192
  • 30
  • 175
  • 224
user1171835
  • 1,193
  • 2
  • 9
  • 12
  • Just FYI: It's more common to use x, y, and z for the independent variables (i.e. the knowns, in this case), and a, b, c for the model parameters that you're trying to solve for. When I first read your equation, I was about to say "but that's linear" (it is in terms of a, b, and c). I know it's silly to quibble about terminology, but as it's currently phrased, many people are likely to mis-read your question. (Good, clear question, though. +1) – Joe Kington Oct 23 '13 at 13:34
  • Also, it's possible to linearize this. I'm typing up an answer, but I don't have time to finish it right now. If no one else answers in the meantime, I'll finish up my answer and post it in an hour or two (hopefully someone else will beat me to it). Good luck! – Joe Kington Oct 23 '13 at 13:51
  • The laziest way (but easiest to implement i think) is to just precompute for n (lets say 10) values for each parameter (so 1000 combinations in total), and than see which combination scores closest to zero, and than zoom in around that area. That should work pretty easily for most types of equations, to give you an impression of where to look, but there are more fancy ways that will work faster and(/or) more accurate. – usethedeathstar Oct 23 '13 at 14:15
  • @usethedeathstar - `scipy.optimize.brute` does exactly what you're describing: http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.brute.html. Keep in mind that you need to search a 3D parameter space in this case. It's simple, but _very_ inefficient. That having been said, if it works, it works. If there are a lot of local minimia and ranges of the parameters are well known, it can be a good approach. – Joe Kington Oct 23 '13 at 14:50
  • @JoeKington True, but 3D is still quite easy, and another benefit of brute force is that you get an idea of the errorbars on your solution. (That being said, as soon as you go over 3D, brute force becomes hopeless) – usethedeathstar Oct 24 '13 at 07:43
  • Can you use [sympy solver](http://docs.sympy.org/dev/tutorial/solvers.html) to do this? – stephenmm Oct 03 '16 at 18:57

2 Answers2

52

There are two ways to do this.

  1. Use a non-linear solver
  2. Linearize the problem and solve it in the least-squares sense

Setup

So, as I understand your question, you know F, a, b, and c at 4 different points, and you want to invert for the model parameters X, Y, and Z. We have 3 unknowns and 4 observed data points, so the problem is overdetermined. Therefore, we'll be solving in the least-squares sense.

It's more common to use the opposite terminology in this case, so let's flip your equation around. Instead of:

F_i = X^2 + a_i Y^2 + b_i X Y cosZ + c_i X Y sinZ

Let's write:

F_i = a^2 + X_i b^2 + Y_i a b cos(c) + Z_i a b sin(c)

Where we know F, X, Y, and Z at 4 different points (e.g. F_0, F_1, ... F_i).

We're just changing the names of the variables, not the equation itself. (This is more for my ease of thinking than anything else.)

Linear Solution

It's actually possible to linearize this equation. You can easily solve for a^2, b^2, a b cos(c), and a b sin(c). To make this a bit easier, let's relabel things yet again:

d = a^2
e = b^2
f = a b cos(c)
g = a b sin(c)

Now the equation is a lot simpler: F_i = d + e X_i + f Y_i + g Z_i. It's easy to do a least-squares linear inversion for d, e, f, and g. We can then get a, b, and c from:

a = sqrt(d)
b = sqrt(e)
c = arctan(g/f)

Okay, let's write this up in matrix form. We're going to translate 4 observations of (the code we'll write will take any number of observations, but let's keep it concrete at the moment):

F_i = d + e X_i + f Y_i + g Z_i

Into:

|F_0|   |1, X_0, Y_0, Z_0|   |d|
|F_1| = |1, X_1, Y_1, Z_1| * |e|
|F_2|   |1, X_2, Y_2, Z_2|   |f|
|F_3|   |1, X_3, Y_3, Z_3|   |g|

Or: F = G * m (I'm a geophysist, so we use G for "Green's Functions" and m for "Model Parameters". Usually we'd use d for "data" instead of F, as well.)

In python, this would translate to:

def invert(f, x, y, z):
    G = np.vstack([np.ones_like(x), x, y, z]).T
    m, _, _, _ = np.linalg.lstsq(G, f)

    d, e, f, g = m
    a = np.sqrt(d)
    b = np.sqrt(e)
    c = np.arctan2(g, f) # Note that `c` will be in radians, not degrees
    return a, b, c

Non-linear Solution

You could also solve this using scipy.optimize, as @Joe suggested. The most accessible function in scipy.optimize is scipy.optimize.curve_fit which uses a Levenberg-Marquardt method by default.

Levenberg-Marquardt is a "hill climbing" algorithm (well, it goes downhill, in this case, but the term is used anyway). In a sense, you make an initial guess of the model parameters (all ones, by default in scipy.optimize) and follow the slope of observed - predicted in your parameter space downhill to the bottom.

Caveat: Picking the right non-linear inversion method, initial guess, and tuning the parameters of the method is very much a "dark art". You only learn it by doing it, and there are a lot of situations where things won't work properly. Levenberg-Marquardt is a good general method if your parameter space is fairly smooth (this one should be). There are a lot of others (including genetic algorithms, neural nets, etc in addition to more common methods like simulated annealing) that are better in other situations. I'm not going to delve into that part here.

There is one common gotcha that some optimization toolkits try to correct for that scipy.optimize doesn't try to handle. If your model parameters have different magnitudes (e.g. a=1, b=1000, c=1e-8), you'll need to rescale things so that they're similar in magnitude. Otherwise scipy.optimize's "hill climbing" algorithms (like LM) won't accurately calculate the estimate the local gradient, and will give wildly inaccurate results. For now, I'm assuming that a, b, and c have relatively similar magnitudes. Also, be aware that essentially all non-linear methods require you to make an initial guess, and are sensitive to that guess. I'm leaving it out below (just pass it in as the p0 kwarg to curve_fit) because the default a, b, c = 1, 1, 1 is a fairly accurate guess for a, b, c = 3, 2, 1.

With the caveats out of the way, curve_fit expects to be passed a function, a set of points where the observations were made (as a single ndim x npoints array), and the observed values.

So, if we write the function like this:

def func(x, y, z, a, b, c):
    f = (a**2
         + x * b**2
         + y * a * b * np.cos(c)
         + z * a * b * np.sin(c))
    return f

We'll need to wrap it to accept slightly different arguments before passing it to curve_fit.

In a nutshell:

def nonlinear_invert(f, x, y, z):
    def wrapped_func(observation_points, a, b, c):
        x, y, z = observation_points
        return func(x, y, z, a, b, c)

    xdata = np.vstack([x, y, z])
    model, cov = opt.curve_fit(wrapped_func, xdata, f)
    return model

Stand-alone Example of the two methods:

To give you a full implementation, here's an example that

  1. generates randomly distributed points to evaluate the function on,
  2. evaluates the function on those points (using set model parameters),
  3. adds noise to the results,
  4. and then inverts for the model parameters using both the linear and non-linear methods described above.

import numpy as np
import scipy.optimize as opt

def main():
    nobservations = 4
    a, b, c = 3.0, 2.0, 1.0
    f, x, y, z = generate_data(nobservations, a, b, c)

    print 'Linear results (should be {}, {}, {}):'.format(a, b, c)
    print linear_invert(f, x, y, z)

    print 'Non-linear results (should be {}, {}, {}):'.format(a, b, c)
    print nonlinear_invert(f, x, y, z)

def generate_data(nobservations, a, b, c, noise_level=0.01):
    x, y, z = np.random.random((3, nobservations))
    noise = noise_level * np.random.normal(0, noise_level, nobservations)
    f = func(x, y, z, a, b, c) + noise
    return f, x, y, z

def func(x, y, z, a, b, c):
    f = (a**2
         + x * b**2
         + y * a * b * np.cos(c)
         + z * a * b * np.sin(c))
    return f

def linear_invert(f, x, y, z):
    G = np.vstack([np.ones_like(x), x, y, z]).T
    m, _, _, _ = np.linalg.lstsq(G, f)

    d, e, f, g = m
    a = np.sqrt(d)
    b = np.sqrt(e)
    c = np.arctan2(g, f) # Note that `c` will be in radians, not degrees
    return a, b, c

def nonlinear_invert(f, x, y, z):
    # "curve_fit" expects the function to take a slightly different form...
    def wrapped_func(observation_points, a, b, c):
        x, y, z = observation_points
        return func(x, y, z, a, b, c)

    xdata = np.vstack([x, y, z])
    model, cov = opt.curve_fit(wrapped_func, xdata, f)
    return model

main()
Joe Kington
  • 275,208
  • 71
  • 604
  • 463
  • This is brilliant! I was looking at using scipy.optimize before and couldn't get my head around it. Would be great out of interest if you wouldn't mind having a go at it. Thanks again – user1171835 Oct 23 '13 at 14:59
  • 18
    Lovely! This type of answers remind me of the Stepanov quote from [here](http://www.emarcus.org/papers/PAM.pdf): "Once upon a time programmers loved mathematics and knew it well. (...) Nowadays, we have programmers – even senior, principal and chief programmers – who are proud not to know high school mathematics. It is becoming fashionable to boast of being practical, with mathematics being viewed as academic mumbo-jumbo. We believe that the separation of programming from mathematics is suicidal for programming. Mathematically illiterate people do not innovate." – Jaime Oct 23 '13 at 16:00
  • @Jamie - Thanks, I'm flattered! Excellent quote! – Joe Kington Oct 23 '13 at 18:37
2

You probably want to be using scipy's nonlinear solvers, they're really easy: http://docs.scipy.org/doc/scipy/reference/optimize.nonlin.html

Joe
  • 227
  • 1
  • 2