7

I am trying to replicate an analysis in Python that we normally do with Matlab. A certain step involves solving a least squares problem with 2 rectangular matrices using the backslash \ (mldivide) operator (to solve for x in Ax = b).

I've noticed that under some circumstances, Matlab produces very different results than np.linalg.lstsq. If my matrix A is rank deficient, then A\b gives me a warning, but also gives me an answer with some of the columns set to all zeros.

A = reshape(1:35,5,7);
b = [0.5, 0.4, 0.3, 0.2, 0.1]'

A\b
Warning: Rank deficient, rank = 2, tol =  1.147983e-13. 

ans =

    -0.1200
     0
     0
     0
     0
     0
     0.0200

After much digging, I learned that the backslash operator does very different things depending on the input A. I found an excellent description here: How to implement Matlab's mldivide (a.k.a. the backslash operator "\"). I also found a similar question to my own, but with square matrices: Numpy vs mldivide,"\" matlab operator . My situation involves rectangular matrices so it's a bit different.

When A is rectangular, \ uses QR factorization, and I can reproduce the same solution like this:

[Q,R]  = qr(A);
R \ (Q'*b)
ans =

    -0.1200
     0
     0
     0
     0
     0
     0.0200

I understand that this second time \ is called (for R\(Q'*b)) it uses back-substitution to do this. Could someone provide either Matlab or python code to do this step? I have found some examples of back-substitution, but they seem to all be for square matrices.

I realize my arbitrary example is not very realistic, and the rank deficiency is a problem, but my priority is to replicate the matlab code as closely as possible, to confirm that the other steps in the analysis are being properly replicated.

A secondary question is, is there a reason the trust the answer given by mldivide over lstsq? I understand that lstsq gives the minimum 2-norm solution, which seems appropriate, so how is this qualitatively different from what Matlab provides?

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
thrillhouse
  • 121
  • 1
  • 6
  • 1
    Related, [Numpy vs mldivide,“\” matlab operator](https://stackoverflow.com/q/33559946). The matrix R in QR decomposition is trapezoid-shaped (would be trianglular if it were square). A linear system with such a matrix is solved by back-substitution, beginning with an equation that has just one unknown on the left. Being rank-deficient forces some arbitrary choices: one sets some "unneeded" variables to zero, obtaining a triangular matrix. E.g., when solving x+y=1, Matlab will return either [1;0] or [0;1], –  Jul 10 '16 at 07:57
  • I believe you have an XY problem, you really want to solve X but you're asking about Y which you cooked up while trying to solve that. Note that you can just call `lin.lstsq(A,b)` for the same result. But what's important, is a conceptual problem here. What I would've answered has already been said by @Bookend in the linked duplicate, and the comments on the corresponding question. – Andras Deak -- Слава Україні Jul 10 '16 at 20:57
  • This makes sense. I have edited the question to focus on the issue of computing `R\(Q'*b)` using back-substitution, which was not answered in the other post. Thanks! – thrillhouse Jul 11 '16 at 18:52

0 Answers0