Optimize a "mask" function in Matlab

Question

For a benchmark comparison, I consider the simple function:

function dealiasing2d(where_dealiased, data)
[n1, n0, nk] = size(data);
for i0=1:n0 
    for i1=1:n1
        if where_dealiased(i1, i0)
            data(i1, i0, :) = 0.;
        end
    end
end

It can be useful in pseudo-spectral simulations (where data is a 3d array of complex numbers) but basically it applies a mask to a set of images, putting to zeros some elements for which where_dealiased is true.

I compare the performance of different languages (and implementations, compilers, ...) on this simple case. For Matlab, I time the function with timeit. Since I don't want to benchmark my ignorance in Matlab, I would like to really optimize this function with this language. What would be the fastest way to do this in Matlab?

The simple solution I use now is:

function dealiasing2d(where_dealiased, data)
[n1, n0, nk] = size(data);
N = n0*n1;
ind_zeros = find(reshape(where_dealiased, 1, []));
for ik=1:nk
    data(ind_zeros + N*(ik-1)) = 0;
end

I suspect this is not the right way to do it since the equivalent Numpy solution is approximately 10 times faster.

import numpy as np

def dealiasing(where, data):
    nk = data.shape[0]
    N = reduce(lambda x, y: x*y, data.shape[1:])
    inds, = np.nonzero(where.flat)
    for ik in xrange(nk):
        data.flat[inds + N*ik] = 0.

Finally, if someone tells me something like "When you want to be very fast with a particular function in Matlab, you should compile it like that: [...]", I would include such solution in the benchmark.

Edit:

After 2 answers, I've benchmarked the propositions and it seems that there is no noticeable performance improvement. This is strange since the simple Python-Numpy solution is really (one order of magnitude) much faster so I am still looking for a better solution with Matlab...

Similar question: http://stackoverflow.com/questions/3407525/how-can-i-index-a-3-d-matrix-with-a-2-d-mask-in-matlab — knedlsepp, Feb 21 '15 at 13:01

Luis Mendo · Answer 1 · 2015-02-21T11:24:32.463

If I understand correctly, this can be done easily and quickly with bsxfun:

data = bsxfun(@times, data, ~where_dealiased);

This sets to 0 all third-dimension-components of the entries for which where_dealiased is true (it multiplies them by 0), and leaves the rest as they were (it multiplies them by 1).

Of course, this assumes [size(data,1) size(data,2]==size(where_dealiased).

Your solution with linear indexing is probably very fast too. To save some time there, you can remove the reshape, because find already returns linear indices:

ind_zeros = find(where_dealiased);

score 1 · Answer 2 · answered Feb 21 '15 at 12:10

Approach #1: Logical indexing With repmat -

data(repmat(where_dealiased,1,1,size(data,3))) = 0;

Approach #2: Linear indexing with bsxfun(@plus -

[m,n,r] =  size(data);
idx = bsxfun(@plus,find(where_dealiased),[0:r-1]*m*n); %// linear indices
data(idx) = 0;

This should be fast if you are have few non zero elements in where_dealiased.

score 1 · Answer 3 · edited May 23 '17 at 11:50

No optimization without benchmark! So here are some proposed solutions and the performance measurements. The initialization code is:

N = 2000;
nk = 10;

where = false([N, N]);
where(1:100, 1:100) = 1;
data = (5.+j)*ones([N, N, nk]);

and I time the functions with the function timeit like this:

timeit(@() dealiasing2d(where, data))

For comparison, when I do exactly the same with the Numpy function given in the question, it runs in 0.0167 s.

The initial Matlab functions with the 2 loops runs in approximately 0.34 s and the equivalent Numpy function (with 2 loops) is slower and runs in 0.42 s. It could be because Matlab uses JIT compilation.

Luis Mendo mentions that I can remove the reshape because find already returns linear indices. I like it since the code is much cleaner but a reshape is anyway very cheap so it does not really improve the performance of the function:

function dealiasing2d(where, data)
[n1, n0, nk] = size(data);
N = n0*n1;
ind_zeros = find(where);
for ik=1:nk
    data(ind_zeros + N*(ik-1)) = 0;
end

This function takes 0.23 s, which is faster than the solution with the 2 loops but really slow compared to the Numpy solution (~14 times slower!). That was the reason why I wrote my question.

Luis Mendo also proposes a solution based on the function bsxfun, which gives:

function dealiasing2d_bsxfun(where, data)
data = bsxfun(@times, data, ~where);

This solution involves N*N*nk multiplications (by 1 or 0), which is clearly too much work since we just have to put to zero 100*100*nk values in the array data. However, these multiplications can be vectorized so it is "quite fast" compared to the other Matlab solutions: 0.23 s, i.e. the same as the first solution using find!

Both solutions proposed by Divakar involves the creation of a large array of size N*N*nk. There is no Matlab loop so we can hope for better performances but...

function dealiasing2d_bsxfun2(where, data)
[n1, n0, nk] = size(data);
idx = bsxfun(@plus, find(where), [0:nk-1]*n1*n0);
data(idx) = 0;

takes 0.23 s (still same amount of time as the other functions!) and

function dealiasing2d(where, data)
data(repmat(where,[1,1,size(data,3)])) = 0;

takes 0.30 s (~ 20% more than the other Matlab solutions).

To conclude, it seems that there is something that limits the performance of Matlab in this case. It could also be that there is a better solution in Matlab or that I am doing something wrong with the benchmark... It would be great if someone with Matlab and Python-Numpy can provide other timings.

Edit:

Some more data regarding Divakar comment:

For N = 500 ; nk = 500:

Method          | time (s) | normalized      
----------------|----------|------------
Numpy           |    0.05  |     1.0
Numpy loop      |    0.05  |     1.0
Matlab bsxfun   |    0.70  |    14.0
Matlab find     |    0.75  |    15.0
Matlab bsxfun2  |    0.76  |    15.2
Matlab loop     |    0.77  |    15.4
Matlab repmat   |    0.96  |    19.2

For N = 500 ; nk = 100:

Method          | time (s) | normalized      
----------------|----------|------------
Numpy           |    0.01  |     1.0
Numpy loop      |    0.03  |     3.0
Matlab bsxfun   |    0.14  |    12.7
Matlab find     |    0.15  |    13.6
Matlab bsxfun2  |    0.16  |    14.5
Matlab loop     |    0.16  |    14.5
Matlab repmat   |    0.20  |    18.2

For N = 2000 ; nk = 10:

Method          | time (s) | normalized |     
----------------|----------|------------|
Numpy           |    0.02  |     1.0    |
Matlab find     |    0.23  |    13.8    |
Matlab bsxfun2  |    0.23  |    13.8    |
Matlab bsxfun   |    0.24  |    14.4    |
Matlab repmat   |    0.30  |    18.0    |
Matlab loop     |    0.34  |    20.4    |
Numpy loop      |    0.42  |    25.1    |

I really wonder why Matlab seems so slow compared to Numpy...

Well generally the benefits the idea of vectorizing over a loopy approach is seen when the number of iterations is an appreciably large number, so that might be it here with `Nk` as just `10`? — Divakar, Feb 21 '15 at 16:36

Optimize a "mask" function in Matlab

3 Answers3

Linked