12

I understand how using vectorization in a language like MATLAB speeds up the code by removing the overhead of maintaining a loop variable, but how does the vectorization actually take place in the assembly / machine code? I mean there still has to be a loop somewhere, right?

animuson
  • 53,861
  • 28
  • 137
  • 147
Jon Cohen
  • 445
  • 3
  • 16

2 Answers2

9

Matlab 'vectorization' concept is completely different than the vector instructions concept, such as SSE. This is a common misunderstanding between two groups of people: matlab programmers and C/asm programmers. Matlab 'vectorization', as the word is commonly used, is only about expressing loops in the form of (vectors of) matrix indices, and sometimes about writing things in terms of basic matrix/vector operations (BLAS), instead of writing the loop itself. Matlab 'vectorized' code is not necessarily expressed as vectorized CPU instructions. Consider the following code:

A = rand(1000);
B = (A(1:2:end,:)+A(2:2:end,:))/2;

This code computes mean values for two adjacent matrix rows. It is a 'vectorized' matlab expression. However, since matlab stores matrices column-wise (columns are contiguous in memory), this operation is not trivially changed into operations on SSE vectors: since we perform the operations row-wise the data you need to load into the vectors is not stored contiguously in the memory.

This code on the other hand

A = rand(1000);
B = (A(:,1:2:end)+A(:,2:2:end))/2;

can take advantage of SSE instructions and streaming instructions, since we operate on two adjacent columns at a time.

So, matlab 'vectorization' is not equivalent to using CPU vector instructions. It is just a word used to signify the lack of a loop implemented in MATLAB. To add to the confusion, sometimes people even use the word to say that some loop has been implemented using a built-in function, such as arrayfun, or bsxfun. Which is even more misleading since those functions might be significantly slower than native matlab loops. As robince said, not all loops are slow in matlab nowadays, though you do need to know when they work, and when they don't.

And in any way you always need a loop, it is just implemented in matlab built-in functions / BLAS instead of the users matlab code.

Community
  • 1
  • 1
angainor
  • 11,760
  • 2
  • 36
  • 56
  • AFAIK, though, MATLAB does use multiple processors to carry out the work. So, it may be parallelized, but not vectorized (according to the "vector instructions" mindset). Also, I'd assume that any decent numeric JIT (like they have in new MATLAB) would use actual vector instructions... – nneonneo Sep 27 '12 at 08:00
  • @nneonneo It would, so thats another reason why native matlab loops can also be vectorized. Parallel code on the other hand is not vectorization. And loops in matlab can sometimes also be implemented efficiently in parallel (tricky, but possible) using parfor or simd. – angainor Sep 27 '12 at 08:03
  • @nneonneo And to be honest, parallel work done by matlab under the hood can actually decrease the performance in many cases, so you have to be very careful :) Moreover, for some functions it is impossible to force matlab to run sequentially, e.g. sort. Well, maybe it is possible, I just never figured it out. maxNumCompThreads does not work on them, neither does OMP_NUM_THREADS. – angainor Sep 27 '12 at 08:05
  • +1: The answer you linked is quite insightful. I don't do a ton of MATLAB programming (which is why I failed to recognize that the term *vectorized* meant 'not using a loop'), but I definitely learnt a lot from your answer and the linked answer. Thanks! – nneonneo Sep 27 '12 at 08:11
8

Yes there is still a loop. But it is able to loop directly in compiled code. Loops in Fortran (on which Matlab was originally based) C or C++ are not inherently slow. That they are slow in Matlab is a property of dynamic runtime (they are also slower in other dynamic languages like Python).

Since Matlab has introduced a Just-In-Time compiler loop performance has actually increased dramatically - so the old guidelines to avoid loops are less important with recent versions than they once were.

robince
  • 10,826
  • 3
  • 35
  • 48