I am writing MATLAB scripts since some time and, still, I do not understand how it works "under the hood". Consider the following script, that do some computation using (big) vectors in three different ways:
- MATLAB vector operations;
- Simple for cycle that do the same computation component-wise;
- An optimized cycle that is supposed to be faster than 2. since avoid some allocation and some assignment.
Here is the code:
N = 10000000;
A = linspace(0,100,N);
B = linspace(-100,100,N);
C = linspace(0,200,N);
D = linspace(100,200,N);
% 1. MATLAB Operations
tic
C_ = C./A;
D_ = D./B;
G_ = (A+B)/2;
H_ = (C_+D_)/2;
I_ = (C_.^2+D_.^2)/2;
X = G_ .* H_;
Y = G_ .* H_.^2 + I_;
toc
tic
X;
Y;
toc
% 2. Simple cycle
tic
C_ = zeros(1,N);
D_ = zeros(1,N);
G_ = zeros(1,N);
H_ = zeros(1,N);
I_ = zeros(1,N);
X = zeros(1,N);
Y = zeros(1,N);
for i = 1:N,
C_(i) = C(i)/A(i);
D_(i) = D(i)/B(i);
G_(i) = (A(i)+B(i))/2;
H_(i) = (C_(i)+D_(i))/2;
I_(i) = (C_(i)^2+D_(i)^2)/2;
X(i) = G_(i) * H_(i);
Y(i) = G_(i) * H_(i)^2 + I_(i);
end
toc
tic
X;
Y;
toc
% 3. Opzimized cycle
tic
X = zeros(1,N);
Y = zeros(1,N);
for i = 1:N,
X(i) = (A(i)+B(i))/2 * (( C(i)/A(i) + D(i)/B(i) ) /2);
Y(i) = (A(i)+B(i))/2 * (( C(i)/A(i) + D(i)/B(i) ) /2)^2 + ( (C(i)/A(i))^2 + (D(i)/B(i))^2 ) / 2;
end
toc
tic
X;
Y;
toc
I know that one shall always try to vectorize computations, being MATLAB build over matrices/vectors (thus, nowadays, it is not always the best choice), so I am expecting that something like:
C = A .* B;
is faster than:
for i in 1:N,
C(i) = A(i) * B(i);
end
What I am not expecting is that it is actually faster even in the above script, despite the second and the third methods I am using go through only one cycle, whereas the first method performs many vector operations (which, theoretically, are a "for" cycle every time). This force me to conclude that MATLAB has some magic that permit (for example) to:
C = A .* B;
D = C .* C;
to be run faster than a single "for" cycle with some operation inside it.
So:
- what is the magic that avoid the 1st part to be executed so fast?
- when you write "D= A .* B" does MATLAB actually do a component wise computation with a "for" cycle, or simply keeps track that D contains some multiplication of "bla" and "bla"?
EDIT
- suppose I want to implement the same computation using C++ (using maybe some library). Will be the first method of MATLAB be faster even than the third one implemented in C++? (I'll answer to this question myself, just give me some time.)
EDIT 2
As requested, here there are the experiment runtimes:
Part 1: 0.237143
Part 2: 4.440132 of which 0.195154 for allocation
Part 3: 2.280640 of which 0.057500 for allocation
and without JIT:
Part 1: 0.337259
Part 2: 149.602017 of which 0.033886 for allocation
Part 3: 82.167713 of which 0.010852 for allocation