Matlab lsqnonlin in Python

Question

I have been working through the Hartley and Zisserman multiple view geometry text and have implemented the gold standard algorithm for computing the Fundamental matrix. This requires solving a non-linear minimization problem using Levenberg-Marquardt.

I implemented this with scipy.optimize.least_squares, but the performance is orders of magnitude slower that similar (e.g., same functionality) matlab code that uses lsqnonlin. In neither case am I supplying the Jacobian or a mask of the sparsity of the Jacobian.

With respect to compute times, this is true for the range of available scipy solvers. I wonder if an alternative exists that has similar performance (numerical & speed) to matlab or if moving to a wrapped, compiled solver is going to be necessary?

Edit for the code request comment. I am trying to limit the total amount of code inserted.

Matlab:

P2GS = lsqnonlin(@(h)ReprojErrGS(corres1,PF1,corres2,h),PF2); 

function REGS = ReprojErrGS(corres1,PF1,corres2,PF2)
   %Find estimated 3D point by Triangulation method
   XwEst = TriangulationGS(corres1,PF1,corres2,PF2);
   %Reprojection Back to the image
   x1hat = PF1*XwEst;
   x1hat = x1hat ./ repmat(x1hat(3,:),3,1);
   x2hat = PF2*XwEst;
   x2hat = x2hat ./ repmat(x2hat(3,:),3,1);
   %Find root mean squared distance error
   dist = ((corres1 - x1hat).*(corres1 - x1hat))  +  ((corres2 - x2hat).*    (corres2 - x2hat));
   REGS = sqrt(sum(sum(dist)) / size(corres1,2));

Triangulation is the standard method, iterating over all points, setting up Ax=0 and solving using SVD.

Python:

# Using 'trf' for performance, swap to 'lm' for levenberg-marquardt
result = optimize.least_squares(projection_error, p1.ravel(), args=(p, pt.values, pt1.values), method='trf')
# Inputs are pandas dataframe, hence the .values

# Triangulate the correspondences 
xw_est = triangulate(pt, pt1, p, p1)
# SciPy does not like 2d multi-dimensional variables, so reshape

if p1.shape != (3,4):
    p1 = p1.reshape(3,4)

xhat = p.dot(xw_est).T
xhat /= xhat[:,-1][:,np.newaxis]
x2hat = p1.dot(xw_est).T
x2hat /= x2hat[:,-1][:,np.newaxis]
# Compute error
dist = (pt - xhat)**2 + (pt1 - x2hat)**2
reproj_error = np.sqrt(np.sum(dist, axis=1) / len(pt))
# print(reproj_error)
return reproj_error

This should be fully vectorized. Triangulation is as above. I can add that could but would likely link a gist to keep the question size managable.

You should add both sets of code for comparison. Could it be possible that you just write more efficient MATLAB code than you write Python code? — Dan, May 13 '16 at 13:43
Related: [Why is MATLAB so fast in matrix multiplication?](http://stackoverflow.com/questions/6058139/why-is-matlab-so-fast-in-matrix-multiplication) — Adriaan, May 13 '16 at 13:46
The MATLAB speed post is of interest, but the scipy code should also be using BLAS/LaPack. The above should be vectorized for performance. — Jzl5325, May 13 '16 at 16:45
Is the number of function evaluations the same or different? Are the convergence tolerances default values the same for lstnonlin and least_squares --- you don't seem to specify them here? Is the performance bottleneck even the solver itself, or is it the implementation of the objective function? You should check these to be sure where the issue is. — pv., May 15 '16 at 09:04
I think the default tolerances at least differ, least_squares used sqrt(eps)~1e-8 whereas lsqnonlin has defaults at 1e-6 --- consequently, you would expect least_squares has to do more work to achieve the stricter tolerance. So the comparison above is certainly not a fair one. — pv., May 15 '16 at 09:15
@pv great questions an insight into the issue. I will debug and take a look. — Jzl5325, May 16 '16 at 15:35

score 2 · Answer 1 · answered May 13 '16 at 18:23

2

least_squares is very new. As of Fall 2015, there were no alternatives in SciPy land. Otherwise, there's e.g. Ceres.

There surely are many opportunities to speed up least_squares --- pull requests are gladly accepted :-). The first thing to check is that SciPy is linked to a decent LAPACK implementation though.

answered May 13 '16 at 18:23

ev-br

24,968
9
65
78

First, I would make sure the comparison is fair --- default tolerances etc probably differ. – pv. May 15 '16 at 09:16

Miau · Answer 2 · 2021-07-06T17:37:12.140

A reason for the huge difference in speed could probably be that lsqnonlin of matlab is able to detect the sparse structure of the Jacobian matrix and therefore computes it a lot faster. On the other side scipy's least_squares doesn't handle sparse Jacobian matrices and computes every element of the Jacobian like in the standard case (also the unnecessary entries).

In this particular optimization problem (Gold Standard Algorithm to determine the fundamental matrix), the sparse computation of the Jacobian is crucial to obtain a good performance like mentioned and described in Hartley and Zisserman. (several minutes up to >10, depending on the number of point correspondences used for the computation of F)

This is a link to a Paython Wrapper of a sparse LM implementation. But it is a bit more complicated to make it run:

http://www.bmp.ds.mpg.de/pysparselm.html

I hope scipy's least_squares will soon get an upgrade to handle sparse Jacobians. :) (Trusted Region Reflective of scipy least squares is able to handle sparse Jacobians, check this tutorial. However, since LM is actually the MINPACK implementation, you have to look elsewhere for an implementation of sparse LM.)

Matlab lsqnonlin in Python

2 Answers2