Limitations of the Levenberg-Marquardt algorithm

Question

I am using Levenberg-Marquardt algorithm to minimize a non-linear function of 6 parameters. I have got about 50 data points for each minimization, but I do not get sufficiently accurate results. Does the fact, that my parameters differ from each other by a few orders of magnitudes can be so much significant? If yes, where should I look for the solution? If no, what kind of limitations of LMA you met in your work (it may help to find other problems with my applictaion)? Many Thanks for your help.

Edit: The problem I am trying to solve is to determine the best transformation T:

typedef struct 
{
    double x_translation, y_translation, z_translation; 
    double x_rotation, y_rotation, z_rotation;
} transform_3D;

to fit the set of 3D points to the bunch of 3D lines. In detail I have got a set of coordinates of 3D points and equations of corresponding 3D lines, which should go through those points (in ideal situation). The LMA is minimizing the summ of distances of the transfomed 3D points to corresponding 3D lines. The transform function is as follows:

cv::Point3d Geometry::transformation_3D(cv::Point3d point, transform_3D transformation)
{
    cv::Point3d p_odd,p_even;

    //rotation x
    p_odd.x=point.x;
    p_odd.y=point.y*cos(transformation.x_rotation)-point.z*sin(transformation.x_rotation); 
    p_odd.z=point.y*sin(transformation.x_rotation)+point.z*cos(transformation.x_rotation);

    //rotation y
    p_even.x=p_odd.z*sin(transformation.y_rotation)+p_odd.x*cos(transformation.y_rotation);
    p_even.y=p_odd.y;
    p_even.z=p_odd.z*cos(transformation.y_rotation)-p_odd.x*sin(transformation.y_rotation);

    //rotation z
    p_odd.x=p_even.x*cos(transformation.z_rotation)-p_even.y*sin(transformation.z_rotation);
    p_odd.y=p_even.x*sin(transformation.z_rotation)+p_even.y*cos(transformation.z_rotation);
    p_odd.z=p_even.z;

    //translation
    p_even.x=p_odd.x+transformation.x_translation;
    p_even.y=p_odd.y+transformation.y_translation;
    p_even.z=p_odd.z+transformation.z_translation;

    return p_even;
}

Hope this explanation will help a bit...

Edit2:

Some exemplary data is pasted below. 3D lines are described by the center point and the directional vector. Center point for all lines are (0,0,0) and 'uz' coordinate for each vector is equal to 1. Set of 'ux' coordinates of directional vectors:

-1.0986, -1.0986, -1.0986,
-1.0986, -1.0990, -1.0986,
-1.0986, -1.0986, -0.9995,
-0.9996, -0.9996, -0.9995,
-0.9995, -0.9995, -0.9996,
-0.9003, -0.9003, -0.9004,
-0.9003, -0.9003, -0.9003,
-0.9003, -0.9003, -0.8011,
-0.7020, -0.7019, -0.6028,
-0.5035, -0.5037, -0.4045,
-0.3052, -0.3053, -0.2062,
-0.1069, -0.1069, -0.1075,
-0.1070, -0.1070, -0.1069,
-0.1069, -0.1070, -0.0079,
-0.0079, -0.0079, -0.0078,
-0.0078, -0.0079, -0.0079,
 0.0914,  0.0914,  0.0913,
 0.0913,  0.0914,  0.0915,
 0.0914,  0.0914

Set of 'uy' coordinates of directional vectors:

-0.2032,  -0.0047,    0.1936,
0.3919,    0.5901,    0.7885,
0.9869,    1.1852,    -0.1040,
0.0944,    0.2927,    0.4911,
0.6894,    0.8877,    1.0860,
-0.2032,  -0.0047,    0.1936,
0.3919,    0.5902,    0.7885,
0.9869,    1.1852,    1.0860,
0.9869,    1.1852,    1.0861,
0.9865,    1.1853,    1.0860,
0.9870,    1.1852,    1.0861,
-0.2032,  -0.0047,    0.1937,
0.3919,    0.5902,    0.7885,
0.9869,    1.1852,    -0.1039,
0.0944,    0.2927,    0.4911,
0.6894,    0.8877,    1.0860,
-0.2032,  -0.0047,    0.1935,
0.3919,    0.5902,    0.7885,
0.9869,    1.1852

and set of 3D points in (x. y. z. x. y. z. x. y. z. ...) form:

 {{0, 0, 0}, {0, 16, 0},   {0, 32, 0}, 
 {0, 48, 0}, {0, 64, 0},   {0, 80, 0},
 {0, 96, 0}, {0, 112,0},   {8, 8, 0},
 {8, 24, 0}, {8, 40, 0},   {8, 56, 0}, 
 {8, 72, 0}, {8, 88, 0},   {8, 104, 0}, 
 {16, 0, 0}, {16, 16,0},   {16, 32, 0}, 
{16, 48, 0}, {16, 64, 0},  {16, 80, 0}, 
{16, 96, 0}, {16, 112, 0}, {24, 104, 0}, 
{32, 96, 0}, {32, 112, 0}, {40, 104, 0},
{48, 96, 0}, {48, 112, 0}, {56, 104, 0},
{64, 96, 0}, {64, 112, 0}, {72, 104, 0}, 
{80, 0, 0},  {80, 16, 0},  {80, 32, 0},
{80,48, 0},  {80, 64, 0},  {80, 80, 0}, 
{80, 96, 0}, {80, 112, 0}, {88,  8, 0}, 
{88, 24, 0}, {88, 40, 0},  {88, 56, 0},
{88, 72, 0}, {88, 88, 0},  {88, 104, 0},
{96, 0, 0},  {96, 16, 0},  {96, 32, 0}, 
{96, 48,0},  {96, 64, 0},  {96, 80, 0}, 
{96, 96, 0}, {96, 112, 0}}

This is kind of an "easy" modelled data with very small rotations.

Hi Marcin! Could you post the model and some data? I've a running optimizer that uses LM and I could try it on your data. If it's large you may post on pastebin or ideone ... — Dr. belisarius, Dec 14 '10 at 08:49
@Marcin I see you used the [Mathematica] tag. Is that OK? I thought you were using c++/qt/OpenGL — Dr. belisarius, Dec 14 '10 at 08:56
Yes, sorry for that tag, I thought it was referenced to matematics as a science. I am programing in c++. About your first question, I would rather not post the model as it is a bit too complicated, sorry If it will disturb You in answering:/ — Marcin, Dec 14 '10 at 09:02
@Marcin No problem about the model. I'll refrain from answering since the method, the model and the data are "the set" I need to understand your problem. But I don't doubt you'll get good answers anyway! Good luck with this one! — Dr. belisarius, Dec 14 '10 at 09:15
Tsk, tsk... trigonometric functions; as I said, pretty thorny. You really need good starting estimates if you want LM to work properly. Oh, and I apparently forgot to ask a very important question: have you verified your routine for calculating the required partial derivatives? — , Dec 14 '10 at 14:29
I am using LMA implemented in cminpack LGPL library - i suppose its pretty much optimized. — Marcin, Dec 14 '10 at 14:35
The thing with [MINPACK](http://www.netlib.org/minpack/)... it supports the use of either finite-difference derivatives (the default), or an auxiliary routine for computing the required partial derivatives symbolically. Seeing that you're dealing with rather oscillatory functions, you would probably profit from constructing your own partial derivative routine. And yes, you *still* need to find good starting points; MINPACK definitely won't do that for you. — , Dec 14 '10 at 15:58

score 5 · Accepted Answer · answered Dec 14 '10 at 08:49

5

Well, the proper way of using Levenberg-Marquardt is that you need a good initial estimate (a "seed") for your parameters. Recall that LM is a variant of Newton-Raphson; as with such iterative algorithms, the quality of your starting point will make or break your iteration; either converging to what you want, converging to something completely different (not that unlikely to happen, especially if you have a lot of parameters), or shooting off into the wild blue yonder (diverges).

In any event, it would be more helpful if you could mention the model function you're fitting, and possibly a scatter plot of your data; it might go a long way towards finding a workable solution for this.

answered Dec 14 '10 at 08:49

I am analyzing the numerically modeled data, with a good initial guess for LMA. I obtain significantly better results (smaller square summ), when one or more variables are "fixed". I know that this is not very suprising, but stil - I really need to optimize the accuracy for 6 variables. – Marcin Dec 14 '10 at 09:15
Of course you will; if you fix variables, there is less degrees of freedom for the iteration to wander in. What you could do is to use those good values as a starting point, without fixing any of the parameters. If the values of the parameters aren't perturbed that much, then your starting point(s) were good to begin with. – Dec 14 '10 at 09:28
Yes, I know its obvious that fixing parameters improves results; however, the difference between setting known values as the initial guess, and fixing them gives a big difference. My question is, how to tune LMA to work better with the kind of parameters i mentioned in the main post. – Marcin Dec 14 '10 at 09:51
It really depends on the nature of your model function; if for instance you have a lot of sines or cosines in your model, that presents a problem. If you're trying to fit decaying exponentials, that's a different set of worries. My point is, if you're going to be vague about the nature of your model, the best you'll get is vague advice. – Dec 14 '10 at 10:00
Ok than, I did not realize that the nature of model makes a big difference. I will try to expand my first entry. – Marcin Dec 14 '10 at 12:21
You were right:) Take a look on the improved question if you have some time, its a bit more detailed now. – Marcin Dec 14 '10 at 14:29
@Marcin Sure, but post in Pastebin an ASCII data set big enough to run a program. Usually the problems come from the data ... – Dr. belisarius Dec 14 '10 at 15:39

score 1 · Answer 2 · answered Dec 14 '10 at 13:31

I would suggest you try using a different approach to indirectly find your rotation parameters, namely to use a 4x4 affine transformation matrix to incorporate the translation and rotation parameters.

This gets rid of the nonlinearity of the sine and cosine functions (which you can figure out after the fact).

The tough part would be to constrain the transformation matrix from shearing or scaling, which you don't want.

Dr. belisarius · Answer 3 · 2010-12-15T23:41:42.253

Here you have your problem modeled and running with Mathematica.

I used the "Levenberg-Marquardt" method.

This is why I asked for your data. With MY data, YOUR problems are always going to be easier:)

xnew[x_, y_, z_] := 
  RotationMatrix[rx, {1, 0, 0}].RotationMatrix[
     ry, {0, 1, 0}].RotationMatrix[rz, {0, 0, 1}].{x, y, z} + {tx, ty, tz};

(* Generate Sample Data*)
(* Angles 1/2,1/3,1/5 *)
(* traslation -> {1,2,3} *)
(* Minimum mean Noise 5% *)

data = Table[{{x, y, z},
  RotationMatrix[1/2, {1, 0, 0}].
  RotationMatrix[1/3, {0, 1, 0}].
  RotationMatrix[1/5, {0, 0, 1}].{x, y, z} +{1, 2, 3} +RandomReal[{-.05, .05}, 3]},
  {x, 0, 1, .1}, {y, 0, 1, .1}, {z, 0, 1, .1}];

data = Flatten[data, 2];

(* Now find the parameters*)
FindMinimum[
 Sum[SquaredEuclideanDistance[xnew[i[[1]] /. List -> Sequence], 
   i[[2]]], {i, data}]
 , {rx, ry, rz, tx, ty, tz}, Method -> "LevenbergMarquardt"]

Out:

{3.2423, {rx -> 0.500566, ry -> 0.334012, rz -> 0.199902, 
          tx -> 0.99985,  ty -> 1.99939,  tz -> 3.00021}}

(Within 1/1000 of the real values)

Edit

I worked a little with your data.
The problem is that your system is very bad conditioned. You need much more data to effectively calculate such small rotations.

These are the results I got:

Rotations in degrees:

rx = 179.99999999999999999999984968493536659553226696793
ry = 180.00000000000000000000006934755799995159952661222
rz = 180.0006286861217378980724139120849587855611645627

Traslations

tx = 48.503663696727576867196234527227830090575281353092
ty = 63.974139455057300403798198525151849767949596684232
tz = -0.99999999999999999999997957276716543927459921348549

I should calculate the errors, but I've no time right now.

BTW, rz = Pi + 0.000011 (in radians)

HTH!

you cannot get rid of non-linearity completely. What you suggest will optimize a different objective function which is algebraic error in the parameters while the ideal objective function is the sum of squared residuals. The linear solution is typically used as an initial guess for a non-linear one. — Vlad, Mar 13 '14 at 23:52

score 0 · Answer 4 · answered Dec 25 '15 at 17:46

Well, I used ceres-solver to solve this, but I did make a modification in your data . Instead of "uz=1.0", I used "uz=0.0" which makes this entirely a 2d data fitting.

I got the following results. trans: -88.6384, -16.3879, 0 rot: 0, 0, -6.97813e-05

After getting these results, manually calculated the sum of orthogonal distance of transformed points to the corresponding lines and got 0.0280452.

struct CostFunctor {
    CostFunctor(const double p[3],  double ux, double uy){
        p_[0] = p[0];p_[1] = p[1];p_[2] = p[2];
        n_[0] = ux; n_[1] = uy;
        n_[2] = 0.0;
        normalize(n_);
    }

    template <typename T>
    bool operator()(const T* const x, T* residual) const {
        T pDash[3];
        T pIn[3];
        T temp[3];
        pIn[0] = T(p_[0]);
        pIn[1] = T(p_[1]);
        pIn[2] = T(p_[2]);
        //transform the input point p_ to pDash
        xform(x, &pIn[0], &pDash[0]);
        //find dot(pDash, n), where n is the direction of line
        T pDashDotN = T(pDash[0]) * T(n_[0]) + T(pDash[1]) * T(n_[1]) + T(pDash[2]) * T(n_[2]);
        //projection of pDash along line
        temp[0] = pDashDotN * n_[0];temp[1] = pDashDotN * n_[1];temp[2] = pDashDotN * n_[2];
        //orthogonal vector from projection to point
        temp[0] = pDash[0] - temp[0];temp[1] = pDash[1] - temp[1];temp[2] = pDash[2] - temp[2];
        //squared error
        residual[0] = temp[0] * temp[0] + temp[1] * temp[1] + temp[2] * temp[2];
    return true;
    }
    //untransformed point
    double p_[3];

    double ux_;
    double uy_;
    //direction of line
    double n_[3];
};


template<typename T>
void  xform(const T *x, const T * inPoint, T *outPoint3) {
    T xTheta = x[3];
    T pOdd[3], pEven[3];
    pOdd[0] = inPoint[0];
    pOdd[1] = inPoint[1] * cos(xTheta) + inPoint[2] * sin(xTheta);
    pOdd[2] = -inPoint[1] * sin(xTheta) + inPoint[2] * cos(xTheta);

    T yTheta = x[4];
    pEven[0] = pOdd[0] * cos(yTheta) + pOdd[2] * sin(yTheta);
    pEven[1] = pOdd[1];
    pEven[2] = -pOdd[0] * sin(yTheta) + pOdd[2] * cos(yTheta);


    T zTheta = x[5];

    pOdd[0] = pEven[0] * cos(zTheta) - pEven[1] * sin(zTheta);
    pOdd[1] = pEven[0] * sin(zTheta) + pEven[1] * cos(zTheta);
    pOdd[2] = pEven[2];

    T xTrans = x[0], yTrans = x[1], zTrans = x[2];
    pOdd[0] += xTrans;
    pOdd[1] += yTrans;
    pOdd[2] += zTrans;

    outPoint3[0] = pOdd[0];
    outPoint3[1] = pOdd[1];
    outPoint3[2] = pOdd[2];
}

Limitations of the Levenberg-Marquardt algorithm

4 Answers4