1

We can do matrix multiplication in several ways. A and B are two matrix, of size 1000 * 1000.

  1. Use the matlab builtin matrix operator. A*B
  2. Write two nests explicitly.
  3. Write three nests explicitly.
  4. Dynamically link C code to do the three nests.

The result is

  1. * operator is very fast. It takes about 1 second.
  2. Nests are very slow, especially three nests, 500 seconds.
  3. C Loops are much faster than the matlab nests, 10 seconds.

Can anyone explain the following. Why * is so fast and why C loops is more efficient than write the loops in matlab.

I used another software , not matlab , to do the computing. But I think similar reasoning should apply.

Daniel Heilper
  • 1,182
  • 2
  • 17
  • 34
Yan Song
  • 112
  • 2
  • 13
  • A [related question](http://stackoverflow.com/questions/6058139/why-is-matlab-so-fast-in-matrix-multiplication). – horchler May 25 '14 at 19:46
  • This has been discussed here: http://stackoverflow.com/questions/6058139/why-is-matlab-so-fast-in-matrix-multiplication – Rafael Monteiro May 25 '14 at 22:20

3 Answers3

3

Because matrix multiplication is highly optimised in Matlab, using LAPACK libraries under the hood.

You will find it hard to beat the performance of those libraries. Certainly, simple nested loops don't take cache effects into account, and will thus exhibit terrible performance.

Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680
  • I see your point here. Could you maybe explain why using c loops is faster than write the loops in matlab. I checked the source c code, it's the same as I would write in matlab, a three nests. – Yan Song May 25 '14 at 19:27
2

matlab is an interpreted language, C is compiled. Therefore C loops are much faster than matlab loops. That explains the difference between 2 and 3.

Similarly, matlab's A*B is also a compiled code under the hood. Then why is it still an order of magnitude faster than the C code? After all, if it's just a nested code that's really simple, and the compiler can optimize it to be as fast as if you were writing it in assembly. Well, the answer is that it is most likely not just nested loops. The simple nested loop algorithm runs in O(n^3) time, but if you recursively break up the matrix then you can relatively easily get the O(n^2.8) Strassen algorithm (http://en.wikipedia.org/wiki/Strassen_algorithm) which performs well in practice; or you can even get down to the O(n^2.37) Coppersmith-Winograd algorithm (http://en.wikipedia.org/wiki/Coppersmith%E2%80%93Winograd_algorithm) which is not quite practical.

My bet is that matlab uses some form of the Strassen algorithm, which not only has a better asymptotic speed, but because the submatrices it eventially multiplies are small, it has a better cache utilization as well.

LaszloLadanyi
  • 973
  • 4
  • 12
  • 1
    Matlab uses LAPACK; I'd speculate that LAPACK uses the straightforward N^3 algorithm, but with extremely well-tuned blocking, vectorisation, and possibly multi-threading. – Oliver Charlesworth May 25 '14 at 19:40
  • In other words, Strassen's algorithm is not used ... likely due to issues of numeric stability. – horchler May 25 '14 at 19:52
1

There is a webpage that describes Matrix multiplication speeds in C using naive code and Blas code:

Naïve code

for (i = 0; i < n; i++)
    for (j = 0; j < n; j++) {
        C[i + j*n] = 0;
        for (k = 0; k < n; k++)
            C[i + j*n] += A[i + k*n]*B[k + n*j];
    }

BLAS code

dgemm_(&charN, &charN, &n, &n, &n, &one_d, A, &n, B, &n, &zero_d, C, &n);

showing enter image description here enter image description here enter image description here

Maybe a good option to use those libraries and improve your speed.

edgarmtze
  • 24,683
  • 80
  • 235
  • 386