I am trying to replicate an analysis in Python that we normally do with Matlab. A certain step involves solving a least squares problem with 2 rectangular matrices using the backslash \ (mldivide) operator (to solve for x in Ax = b).
I've noticed that under some circumstances, Matlab produces very different results than np.linalg.lstsq
. If my matrix A
is rank deficient, then A\b
gives me a warning, but also gives me an answer with some of the columns set to all zeros.
A = reshape(1:35,5,7);
b = [0.5, 0.4, 0.3, 0.2, 0.1]'
A\b
Warning: Rank deficient, rank = 2, tol = 1.147983e-13.
ans =
-0.1200
0
0
0
0
0
0.0200
After much digging, I learned that the backslash operator does very different things depending on the input A
. I found an excellent description here: How to implement Matlab's mldivide (a.k.a. the backslash operator "\"). I also found a similar question to my own, but with square matrices: Numpy vs mldivide,"\" matlab operator . My situation involves rectangular matrices so it's a bit different.
When A is rectangular, \
uses QR factorization, and I can reproduce the same solution like this:
[Q,R] = qr(A);
R \ (Q'*b)
ans =
-0.1200
0
0
0
0
0
0.0200
I understand that this second time \
is called (for R\(Q'*b)
) it uses back-substitution to do this. Could someone provide either Matlab or python code to do this step? I have found some examples of back-substitution, but they seem to all be for square matrices.
I realize my arbitrary example is not very realistic, and the rank deficiency is a problem, but my priority is to replicate the matlab code as closely as possible, to confirm that the other steps in the analysis are being properly replicated.
A secondary question is, is there a reason the trust the answer given by mldivide
over lstsq
? I understand that lstsq
gives the minimum 2-norm solution, which seems appropriate, so how is this qualitatively different from what Matlab provides?