Good rules of thumb for when to use matrix multiplication, sum(), or for loop in Matlab?

Question

I'm trying to develop general heuristics for translating equations to code. This particular question addresses how to implement an equation with a summation function into Matlab.

An example of using sum() vs. matrix multiplication:

I implemented this equation, and thought I needed to use a sum() function:

equation1

J = 1/(2*m) * sum( (X*theta - y).^2 );

Then I implemented this similar equation, without needing to use a sum() function!

equation2

theta = theta - (alpha/m) * ((X*theta - y)'*X)';

Where:

X: 100x2 (training input plus a 'ones' vector)
y: 100x1 (training output) 
theta:  2x1 (parameters)
m: 100 (length of y)
alpha: 0.01 (learning rate)

What are the principles for when Matlab's matrix multiplication "handles" the summation?

Thanks!

Not sure to perfectly understand your question, and knowing the dimensions of the variables in the terms of your coded equations would help. That aside, some matrix multiplication rules do include some summations of their intermediate terms. Nothing to do with Matlab specially, just the mathematics rules. Any software offering matrix multiplications functions should incorporate these rules (as Matlab does). — Hoki, Jul 03 '16 at 06:37
Your edit is unclear. I don't know what it is you're asking anymore. — rayryeng, Jul 03 '16 at 17:30
Thanks for the original response and the note! I'm trying to develop general heuristics for _translating equations to code_. This particular question addressed how to implement an equation with a _summation_ function into _Matlab_. Based on your code below, it looks like I didn't need to use sum() for either function. Your code (copied here) would have worked. `d = X*theta - y;` `J = 1/(2*m)*(d.'*d);` — stevemuse, Jul 03 '16 at 20:07
Ah I see. A general guideline is to try and translate sums to some form of matrix vector multiplication or a dot product. It's hard to give a general guideline as it depends on the equation at hand. It really just chalks to experience so I can't really suggest guidelines for you there... But try and look for patterns. That's all I can offer. — rayryeng, Jul 03 '16 at 20:56

score 6 · Accepted Answer · edited May 23 '17 at 12:13

Always use matrix multiplication or anything dealing with matrices or vectors in a linear algebra context. Specifically, if you can compute whatever you need to compute using linear algebra (a combination of addition, subtraction, multiplication, etc. on matrices), then do it. The reason why MATLAB was created was to perform operations using linear algebra as fast as possible. Using sum would certainly be slower. For example, look at this post: fast matrix multiplication in Matlab

This post also provides insight: Matlab matrix multiplication speed. MATLAB also performs this multi-threaded and is heavily optimized for multiple cores.

If you want a test, let's tackle the easier case (equation 1) where we can see that you can use either sum or matrix multiplication to calculate this quantity. You can compute J also using matrix multiplication with:

d = X*theta - y;
J = 1/(2*m)*(d.'*d);

The above uses the definition of the dot product to compute the sum of squared differences, which can be computed using matrix multiplication where X*theta - y is considered as a m x 1 matrix. With the above, what you are computing specifically is the cost function for linear regression that is to be minimized via gradient descent. Let's create a rather large parameter vector for theta to be 100 x 1, and a data matrix to be 10000000 x 100 where we have 10 million data points by 100 parameters. I have a lot of RAM on my machine so you may not be able to run this test. I'll also initialize these all to random numbers and set a seed to ensure reproducibility. Let's use timeit and see how long these both will take. This is a test function that I wrote:

function test_grad

rng(123);
theta = rand(100,1);
X = rand(1e7, 100);
y = rand(1e7, 1);
m = size(X, 1);

    function test1
    out = 1/(2*m) * sum( (X*theta - y).^2 );
    end

    function test2
    d = X*theta - y;
    out = 1/(2*m)*(d.'*d);
    end

t1 = timeit(@test1);
t2 = timeit(@test2);
fprintf('The timing for sum: %f seconds\n', t1);
fprintf('The timing for matrix multiplication: %f seconds\n', t2);
end

When you run this function in MATLAB, it does extensive tests between using sum and using matrix multiplication.

This is what I get when I run this function. I have 16 GB of RAM on a MacBook Pro with an i7 Intel Core 2.3 GHz CPU:

>> test_grad
The timing for sum: 0.594337 seconds
The timing for matrix multiplication: 0.393643 seconds

As you can see, matrix multiplication (at least on my machine) has a 0.2 second difference on average for each run with timeit.

tl;dr: If you can use matrix multiplication, do it. It's the fastest you'll ever be able to get your code running.

Good rules of thumb for when to use matrix multiplication, sum(), or for loop in Matlab?

1 Answers1