4

I am writing MATLAB scripts since some time and, still, I do not understand how it works "under the hood". Consider the following script, that do some computation using (big) vectors in three different ways:

  1. MATLAB vector operations;
  2. Simple for cycle that do the same computation component-wise;
  3. An optimized cycle that is supposed to be faster than 2. since avoid some allocation and some assignment.

Here is the code:

N = 10000000;

A = linspace(0,100,N);
B = linspace(-100,100,N);
C = linspace(0,200,N);
D = linspace(100,200,N);

% 1. MATLAB Operations
tic
C_ = C./A;
D_ = D./B;

G_ = (A+B)/2;
H_ = (C_+D_)/2;
I_ = (C_.^2+D_.^2)/2;

X = G_ .* H_;
Y = G_ .* H_.^2 + I_;
toc
tic
X;
Y;
toc

% 2. Simple cycle
tic
C_ = zeros(1,N);
D_ = zeros(1,N);
G_ = zeros(1,N);
H_ = zeros(1,N);
I_ = zeros(1,N);
X = zeros(1,N);
Y = zeros(1,N);
for i = 1:N,
  C_(i) = C(i)/A(i);
  D_(i) = D(i)/B(i);

  G_(i) = (A(i)+B(i))/2;
  H_(i) = (C_(i)+D_(i))/2;
  I_(i) = (C_(i)^2+D_(i)^2)/2;

 X(i) = G_(i) * H_(i);
 Y(i) = G_(i) * H_(i)^2 + I_(i);
end
toc
tic
X;
Y;
toc

% 3. Opzimized cycle
tic
X = zeros(1,N);
Y = zeros(1,N);
for i = 1:N,
  X(i) = (A(i)+B(i))/2 * (( C(i)/A(i) + D(i)/B(i) ) /2);
  Y(i) = (A(i)+B(i))/2 * (( C(i)/A(i) + D(i)/B(i) ) /2)^2 +  ( (C(i)/A(i))^2 + (D(i)/B(i))^2 ) / 2;
end
toc
tic
X;
Y;
toc

I know that one shall always try to vectorize computations, being MATLAB build over matrices/vectors (thus, nowadays, it is not always the best choice), so I am expecting that something like:

C = A .* B;

is faster than:

for i in 1:N,
  C(i) = A(i) * B(i);
end

What I am not expecting is that it is actually faster even in the above script, despite the second and the third methods I am using go through only one cycle, whereas the first method performs many vector operations (which, theoretically, are a "for" cycle every time). This force me to conclude that MATLAB has some magic that permit (for example) to:

C = A .* B;
D = C .* C;

to be run faster than a single "for" cycle with some operation inside it.

So:

  1. what is the magic that avoid the 1st part to be executed so fast?
  2. when you write "D= A .* B" does MATLAB actually do a component wise computation with a "for" cycle, or simply keeps track that D contains some multiplication of "bla" and "bla"?

EDIT

  1. suppose I want to implement the same computation using C++ (using maybe some library). Will be the first method of MATLAB be faster even than the third one implemented in C++? (I'll answer to this question myself, just give me some time.)

EDIT 2

As requested, here there are the experiment runtimes:

Part 1: 0.237143

Part 2: 4.440132 of which 0.195154 for allocation

Part 3: 2.280640 of which 0.057500 for allocation

and without JIT:

Part 1: 0.337259

Part 2: 149.602017 of which 0.033886 for allocation

Part 3: 82.167713 of which 0.010852 for allocation

FilippoL
  • 160
  • 7
  • 1
    Could you please complement your experiments with the measured computation time. – zellus May 30 '13 at 21:37
  • 1
    +1, I'm looking forward to the answers! =) – Stewie Griffin May 30 '13 at 21:50
  • The magic is the JIT accelerator ([just in time compilation](http://en.wikipedia.org/wiki/Just-in-time_compilation)). Try your code after running `feature('accel','off')` -I warn you, it will take a while. Don't forget to turn it back on with `feature('accel','on')`. – horchler May 30 '13 at 23:39
  • You'll also note that you'll get different timings in the `for` loop cases depending on if you run it from an M-file (script or function) or directly from the command window. This is likely again due to JIT compilation as well as the more limited scope of functions and variables in the M-file case. – horchler May 30 '13 at 23:47

2 Answers2

3

The first one is the fastest because vectorized code can be easily interpreted to a small number of optimized C++ library calls. Matlab could also optimize it at more high level, for example, replace G*H+I with an optimized mul_add(G,H,I) instead of add(mul(G,H),I) in its core.

The second one can't be converted to C++ calls easily. It has to be interpreted or compiled. The most modern approach for scripting languages is JIT-compilation. The Matlab JIT compiler is not very good but it doesn't mean it has to be so. I don't know why MathWorks don't improve it. Thus #2 performs so slow that #1 is faster even it makes more "mathematical" operations.

Julia language was invented to be a compromise between Matlab expression and C++ speed. The same non-vectorized code (julia vs matlab) works very fast because JIT-compilation is very good.

Dmitry Galchinsky
  • 2,181
  • 2
  • 14
  • 15
  • That makes perfectly sense. This implies that actually is not the vectorized code that is *really fast*, but is the execution of the for loops that is *really slow*? I.e.: does this implies that a porting to a language without this issue (compiled languages, for instance C++), and writing the code as the second part, would result in a faster execution w.r.t to the first part executed in Matlab? – FilippoL May 31 '13 at 09:30
  • 1
    C++ is the champion actually. It runs very fast not only because it is compiled into a native code. It is statically typed, unmanaged (less references, more values which are CPU cache-friendly), template copy-paste also generates fast code and allows inling optimizations. Fortran has even better compilers but you can't use a good looking library like `Armadillo` in Fortran. – Dmitry Galchinsky May 31 '13 at 18:47
  • It's about C++. Unfortunately unvectorized Matlab code is even slower than Javascript V8 expecially when you make a lot of loops, function calls, closures. But vectorized can be faster than C++-loops because high-level optimizations can be performed in theory. But it looks like marketing to me. – Dmitry Galchinsky May 31 '13 at 18:51
  • I am looking for something to learn that could replace Matlab, I was looking into Blitz++, but it doesn't seems to be maintained ATM. Looking at the library you mentioned, Armadillo, seems interesting: do you suggest it as replacement for Matlab? I have looked also at Fortran & LAPACK, but I'll need to learn a new programming language, that is, I'll prefer a C++ library. – FilippoL May 31 '13 at 20:47
  • I have also looked at Julia, is really cool and does exactly what I'm expecting from a modern programming language, surely I'll do some experiment with it (for now I ported some of my Matlab code to Julia and it is really executed fast (once I've converted my vectorized into cycles), faster than Matlab). And the syntax is very friendly with Matlab. The "problem" is that there is too much choice out there: Python/Numpy, LAPACK, C++ libraries, etc... and is difficult to choose. – FilippoL May 31 '13 at 20:49
  • The goal of Armadillo is to be very Matlab-like (it uses BLAS and LAPACK bindings as an engine). I used it to do line-by-line translation from Matlab. Blitz++ looks more non-matlab-style. If you must use C++ and prefer to prototype in Matlab, Armadillo is the best (though in most applications you'll also need OpenCV because it's just a linear algebra library). – Dmitry Galchinsky May 31 '13 at 22:01
  • 1
    I wouldn't say that there's much choice. Matlab has large legacy, its toolboxes and file archive is unique. A lot of scientists upload their code there. Julia lacks very basic functions, Numpy is just a clone, C++ and Fortran are old languages without useful tools like interactive mode (there's Cling but interactive mode is not convinient with heavy C++ syntax). Mathworks prefer to make ribbon interface instead of improving JIT to perform as fast as Julia. – Dmitry Galchinsky May 31 '13 at 22:03
  • 1
    I hoped that GNU Octave JIT compiler which has been developing since last year will work fast but it didn't impressed me (I tryed to run your examples in yesterday JIT-enabled build, the second one was calculated in 900(!) seconds, it's much slower than Matlab so JIT failed to run). It was a GSoC project of Max Brister. – Dmitry Galchinsky May 31 '13 at 22:06
0

Regarding performance optimization I follow @memyself suggestion using the profiler for both approaches as mentioned in 'for' loop vs vectorization in MATLAB.

For educational purposes it does make sense to experiment with numerical algorithms, for anything else I would go with well proven libraries.

Community
  • 1
  • 1
zellus
  • 9,617
  • 5
  • 39
  • 56
  • In fact my question arose when trying to port my Matlab code to Blitz++. I was asking myself: "what if, when I finished my port to C++, I discover that Matlab is still faster than the port?". So I devectorized my code discovering to do some runtime test. Now I'll try to port my script to a faster language. – FilippoL May 31 '13 at 09:28