For an NxP matrix x
and an Nx1 vector y
with N > P, the two expressions
x \ y -- (1)
and
(x' * x) \ (x' * y) -- (2)
both compute the solution b
to the matrix equation
x * b = y
in the least squares sense, i.e. so that the quantity
norm(y - x * b)
is minimized. Expression (2) does it using the classic algorithm for the solution of an ordinary least squares regression, where the left-hand argument to the \
operator is square. It is equivalent to writing
inv(x' * x) * (x' * y) -- (3)
but it uses an algorithm which is more numerically stable. It turns out that (3) is moderately faster than (2) even though (2) doesn't have to produce the inverse matrix as a byproduct, but I can accept that given the additional numerical stability.
However, some simple timings (with N=100,000 and P=30) show that expression (2) is more than 5 times faster than expression (1), even though (1) has greater flexibility to choose the algorithm used! For example, any call to (1) could just dispatch on the size of X, and in the case N>P it could reduce to (2), which would add a tiny amount of overhead, but certainly wouldn't take 5 times longer.
What is happening in expression (1) that is causing it to take so much longer?
Edit: Here are my timings
x = randn(1e5, 30);
y = randn(1e5,1);
tic, for i = 1:100; x\y; end; t1=toc;
tic, for i = 1:100; (x'*x)\(x'*y); end; t2=toc;
assert( abs(norm(x\y) - norm((x'*x)\(x'*y))) < 1e-10 );
fprintf('Speedup: %.2f\n', t1/t2)
Speedup: 5.23