How to write vectorized functions in MATLAB

Question

I am just learning MATLAB and I find it hard to understand the performance factors of loops vs vectorized functions.

In my previous question: Nested for loops extremely slow in MATLAB (preallocated) I realized that using a vectorized function vs. 4 nested loops made a 7x times difference in running time.

In that example instead of looping through all dimensions of a 4 dimensional array and calculating median for each vector, it was much cleaner and faster to just call median(stack, n) where n meant the working dimension of the median function.

But median is just a very easy example and I was just lucky that it had this dimension parameter implemented.

My question is that how do you write a function yourself which works as efficiently as one which has this dimension range implemented?

For example you have a function my_median_1D which only works on a 1-D vector and returns a number.

How do you write a function my_median_nD which acts like MATLAB's median, by taking an n-dimensional array and a "working dimension" parameter?

Update

I found the code for calculating median in higher dimensions

% In all other cases, use linear indexing to determine exact location
% of medians.  Use linear indices to extract medians, then reshape at
% end to appropriate size.
cumSize = cumprod(s);
total = cumSize(end);            % Equivalent to NUMEL(x)
numMedians = total / nCompare;

numConseq = cumSize(dim - 1);    % Number of consecutive indices
increment = cumSize(dim);        % Gap between runs of indices
ixMedians = 1;

y = repmat(x(1),numMedians,1);   % Preallocate appropriate type

% Nested FOR loop tracks down medians by their indices.
for seqIndex = 1:increment:total
  for consIndex = half*numConseq:(half+1)*numConseq-1
    absIndex = seqIndex + consIndex;
    y(ixMedians) = x(absIndex);
    ixMedians = ixMedians + 1;
  end
end

% Average in second value if n is even
if 2*half == nCompare
  ixMedians = 1;
  for seqIndex = 1:increment:total
    for consIndex = (half-1)*numConseq:half*numConseq-1
      absIndex = seqIndex + consIndex;
      y(ixMedians) = meanof(x(absIndex),y(ixMedians));
      ixMedians = ixMedians + 1;
    end
  end
end

% Check last indices for NaN
ixMedians = 1;
for seqIndex = 1:increment:total
  for consIndex = (nCompare-1)*numConseq:nCompare*numConseq-1
    absIndex = seqIndex + consIndex;
    if isnan(x(absIndex))
      y(ixMedians) = NaN;
    end
    ixMedians = ixMedians + 1;
  end
end

Could you explain to me that why is this code so effective compared to the simple nested loops? It has nested loops just like the other function.

I don't understand how could it be 7x times faster and also, that why is it so complicated.

Update 2

I realized that using median was not a good example as it is a complicated function itself requiring sorting of the array or other neat tricks. I re-did the tests with mean instead and the results are even more crazy: 19 seconds vs 0.12 seconds. It means that the built in way for sum is 160 times faster than the nested loops.

It is really hard for me to understand how can an industry leading language have such an extreme performance difference based on the programming style, but I see the points mentioned in the answers below.

Type "open median" at the Matlab command prompt & see how the Mathworks do it! They cheat, however - sort(X, dim) is a built-in. — Max, Oct 18 '11 at 22:13

score 6 · Accepted Answer · edited May 23 '17 at 12:19

Update 2 (to address your updated question)

MATLAB is optimized to work well with arrays. Once you get used to it, it is actually really nice to just have to type one line and have MATLAB do the full 4D looping stuff itself without having to worry about it. MATLAB is often used for prototyping / one-off calculations, so it makes sense to save time for the person coding, and giving up some of C[++|#]'s flexibility.

This is why MATLAB internally does some loops really well - often by coding them as a compiled function.

The code snippet you give doesn't really contain the relevant line of code which does the main work, namely

% Sort along given dimension
x = sort(x,dim);

In other words, the code you show only needs to access the median values by their correct index in the now-sorted multi-dimensional array x (which doesn't take much time). The actual work accessing all array elements was done by sort, which is a built-in (i.e. compiled and highly optimized) function.

Original answer (about how to built your own fast functions working on arrays)

There are actually quite a few built-ins that take a dimension parameter: min(stack, [], n), max(stack, [], n), mean(stack, n), std(stack, [], n), median(stack,n), sum(stack, n)... together with the fact that other built-in functions like exp(), sin() automatically work on each element of your whole array (i.e. sin(stack) automatically does four nested loops for you if stack is 4D), you can built up a lot of functions that you might need just be relying on the existing built-ins.

If this is not enough for a particular case you should have a look at repmat, bsxfun, arrayfun and accumarray which are very powerful functions for doing things "the MATLAB way". Just search on SO for questions (or rather answers) using one of these, I learned a lot about MATLABs strong points that way.

As an example, say you wanted to implement the p-norm of stack along dimension n, you could write

function result=pnorm(stack, p, n)
result=sum(stack.^p,n)^(1/p);

... where you effectively reuse the "which-dimension-capability" of sum.

Update

As Max points out in the comments, also have a look at the colon operator (:) which is a very powerful tool for selecting elements from an array (or even changing it shape, which is more generally done with reshape).

In general, have a look at the section Array Operations in the help - it contains repmat et al. mentioned above, but also cumsum and some more obscure helper functions which you should use as building blocks.

Also look at matrix reshaping, and the many uses of the : operator. — Max, Oct 18 '11 at 22:14
I did an other test with 'mean' instead of 'median' to use a function without sorting and the results are even more crazy. This way the built-in function is actually 160x times faster. It is 0.12 sec vs. 19 seconds! Thanks for the answer and the updates! — hyperknot, Oct 19 '11 at 02:44

score 5 · Answer 2 · edited Feb 16 '20 at 15:01

Could you explain to me that why is this code so effective compared to the simple nested loops? It has nested loops just like the other function.

The problem with nested loops is not the nested loops themselves. It's the operations you perform inside.

Each function call (especially to a non-built-in function) generates a little bit of overhead; more so if the function performs e.g. error checking that takes the same amount of time regardless of input size. Thus, if a function has only a 1 ms overhead, if you call it 1000 times, you will have wasted a second. If you can call it once to perform a vectorized calculation, you pay overhead only once.

Furthermore, the JIT compiler (pdf) can help vectorize simple for-loops, where you, for example, only perform basic arithmetic operations. Thus, the loops with simple calculations in your post are sped up by a lot, while the loops calling median are not.

score 5 · Answer 3 · edited May 23 '17 at 11:48

Vectorization

In addition to whats already been said, you should also understand that vectorization involves parallelization, i.e. performing concurrent operations on data as opposed to sequential execution (think SIMD instructions), and even taking advantage of threads and multiprocessors in some cases...

MEX-files

Now although the "interpreted vs. compiled" point has already been argued, no one mentioned that you can extend MATLAB by writing MEX-files, which are compiled executables written in C, that can be called directly as normal function from inside MATLAB. This allows you to implement performance-critical parts using a lower-level language like C.

Column-major order

Finally, when trying to optimize some code, always remember that MATLAB stores matrices in column-major order. Accessing elements in that order can yield significant improvements compared to other arbitrary orders.

For example, in your previous linked question, you were computing the median of set of stacked images along some dimension. Now the order in which those dimensions are ordered greatly affect the performance. Illustration:

%# sequence of 10 images
fPath = fullfile(matlabroot,'toolbox','images','imdemos');
files = dir( fullfile(fPath,'AT3_1m4_*.tif') );
files = strcat(fPath,{filesep},{files.name}');      %'

I = imread( files{1} );

%# stacked images along the 1st dimension: [numImages H W RGB]
stack1 = zeros([numel(files) size(I) 3], class(I));
for i=1:numel(files)
    I = imread( files{i} );
    stack1(i,:,:,:) = repmat(I, [1 1 3]);   %# grayscale to RGB
end

%# stacked images along the 4th dimension: [H W RGB numImages]
stack4 = permute(stack1, [2 3 4 1]);

%# compute median image from each of these two stacks
tic, m1 = squeeze( median(stack1,1) ); toc
tic, m4 = median(stack4,4); toc
isequal(m1,m4)

The timing difference was huge:

Elapsed time is 0.257551 seconds.     %# stack1
Elapsed time is 17.405075 seconds.    %# stack4

score 2 · Answer 4 · answered Oct 18 '11 at 22:27

In this case

M = median(A,dim) returns the median values for elements along the dimension of A specified by scalar dim

But with a general function you can try splitting your array with mat2cell (which can work with n-D arrays and not just matrices) and applying your my_median_1D function through cellfun. Below I will use median as an example to show that you get equivalent results, but instead you can pass it any function defined in an m-file, or an anonymous function defined with the @(args) notation.

>> testarr = [[1 2 3]' [4 5 6]']

testarr =

     1     4
     2     5
     3     6

>> median(testarr,2)

ans =

    2.5000
    3.5000
    4.5000

>> shape = size(testarr)

shape =

     3     2

>> cellfun(@median,mat2cell(testarr,repmat(1,1,shape(1)),[shape(2)]))

ans =

    2.5000
    3.5000
    4.5000

(Note that the output of the `mat2cell` invocation is a cell array of row vectors.) — hatmatrix, Oct 18 '11 at 22:32

How to write vectorized functions in MATLAB

4 Answers4

Vectorization

MEX-files

Column-major order

Linked