4

I recently ask a question on the optimization of a mask function in Matlab. I got two answers that help me a lot but it appears that all Matlab solutions seem, according to my timing, much slower than one of the Numpy solutions. The code for the different functions can be found in my previous question but to give an idea of what I am doing, I give the Numpy "loop" solution, which is of course not the fastest one but which may be the simplest to read:

def dealiasing2d(where, data):
    nk, n0, n1 = data.shape
    for i0 in xrange(n0):
        for i1 in xrange(n1):
            if where[i0, i1]:
                data[:, i0, i1] = 0.

I obtain (with Matlab R2014b and a "basic" Numpy 1.9.1 linked with Blas and Lapack) (n0 = n1 = N):

N = 500 ; nk = 500:
Method          | time (s) | normalized      
----------------|----------|------------
Numpy           |    0.05  |     1.0
Numpy loop      |    0.05  |     1.0
Matlab find     |    0.74  |    14.8
Matlab bsxfun2  |    0.76  |    15.2
Matlab bsxfun   |    0.78  |    15.6
Matlab loop     |    0.78  |    15.6
Matlab repmat   |    0.89  |    17.8

N = 500 ; nk = 100:
Method          | time (s) | normalized      
----------------|----------|------------
Numpy           |    0.01  |     1.0
Numpy loop      |    0.03  |     3.0
Matlab find     |    0.15  |    13.6
Matlab bsxfun2  |    0.15  |    13.6
Matlab bsxfun   |    0.16  |    14.5
Matlab loop     |    0.16  |    14.5
Matlab repmat   |    0.18  |    16.4

N = 2000 ; nk = 10:
Method          | time (s) | normalized      
----------------|----------|------------
Numpy           |    0.02  |     1.0
Matlab find     |    0.23  |    13.8
Matlab bsxfun2  |    0.23  |    13.8
Matlab bsxfun   |    0.26  |    15.6
Matlab repmat   |    0.28  |    16.8
Matlab loop     |    0.34  |    20.4
Numpy loop      |    0.42  |    25.1

It seems to me that these results are very strange. For me, Numpy and Matlab are quite similar in terms of scientific computing so the performance should be similar, while here there is more than a factor 10! So my first guess is that there is something wrong in the way I compare the two languages. Another possibility could be a problem with my Matlab setup, but I don't understand why. Or a real deep difference between Matlab and Numpy?

Can anyone time these functions to verify these results? Do you have an idea why Matlab seems so much slower than Python in this simple case?

To time the Matlab functions, I use a file with :

N = 500;
n0 = N;
n1 = N;
nk = 500;
disp(['N = ', num2str(N), ' ; nk = ', num2str(nk)])

where = false([n1, n0]);
where(1:100, 1:100) = 1;
data = (5.+1i)*ones([n1, n0, nk]);

disp('time dealiasing2d_loops:')
time = timeit(@() dealiasing2d_loops(where, data));
disp(['     ', num2str(time), ' s'])

disp('time dealiasing2d_find:')
time = timeit(@() dealiasing2d_find(where, data));
disp(['     ', num2str(time), ' s'])

disp('time dealiasing2d_bsxfun:')
time = timeit(@() dealiasing2d_bsxfun(where, data));
disp(['     ', num2str(time), ' s'])

disp('time dealiasing2d_bsxfun2:')
time = timeit(@() dealiasing2d_bsxfun2(where, data));
disp(['     ', num2str(time), ' s'])

disp('time dealiasing2d_repmat:')
time = timeit(@() dealiasing2d_repmat(where, data));
disp(['     ', num2str(time), ' s'])

I measure the performance of the Python functions with

from __future__ import print_function

import numpy as np
from timeit import timeit, repeat

import_lines = {
    'numpy_bad': 'from dealiasing_numpy_bad import dealiasing2d as dealiasing',
    'numpy': 'from dealiasing_numpy import dealiasing'}
tools = import_lines.keys()

time_approx_one_bench = 5.

setup = """
import numpy as np

N = 500
n0, n1 = N, N
nk = 500
where = np.zeros((n0, n1), dtype=np.uint8)
where[0:100, 0:100] = 1
data = (5.+1j)*np.ones((nk, n0, n1), dtype=np.complex128)
"""
exec(setup)

print('n0 = n1 = {}, nk = {}'.format(N, nk))
print(13*' ' + 'min          mean')
durations = np.empty([len(tools)])

for it, tool in enumerate(tools):
    setup_tool = import_lines[tool] + setup
    duration = timeit(setup_tool + 'dealiasing(where, data)', number=1)
    nb_repeat = int(round((time_approx_one_bench - duration)/duration))
    nb_repeat = max(1, nb_repeat)
    ds = np.array(repeat('dealiasing(where, data)',
                         setup=setup_tool, number=1, repeat=nb_repeat))
    duration = ds.min()
    print('{:11s} {:8.2e} s ; {:8.2e} s'.format(
        tool.capitalize() + ':', duration, ds.mean()))
    durations[it] = duration

fastest = tools[durations.argmin()].capitalize()

durations = durations / durations.min()
print('Durations normalized by the fastest method (' + fastest + '):')
for it, tool in enumerate(tools):
    print('{:11s} {:8.3f}'.format(tool.capitalize() + ':', durations[it]))
Community
  • 1
  • 1
paugier
  • 1,838
  • 2
  • 20
  • 39
  • 1
    Not sure if this helps but... one thing that might skew your tests is the fact that calling anonymous functions is VERY slow in MATLAB. Anonymous functions are practically the slowest, partially because they need to recreate a part of the workspace they're called from. Second (but this is a personal preference) -- i don't like timeit(), because it calculates the median of the execution time, not the minimum. Which means that the overheads are REALLY significant when timing a function. –  Feb 27 '15 at 16:28
  • There's a language comparison here: http://julialang.org/benchmarks/ --- on benchmarks mostly without array vectorization. Python interpreter speed seems to be reasonably consistent in this benchmark. – pv. Feb 27 '15 at 16:36

1 Answers1

3

I think this is largely to do with making copies of the data variable. If you arrange things so that MATLAB's copy-on-write behaviour works in your favour, you can get pretty decent times. I wrote a simple version using linear indexing

function data = dealiasing2d2(where_dealiased, data)
[n1, n2, nk] = size(data);
where_li = find(where_dealiased);
for idx = 1:nk
  offset = n1 * n2 * (idx-1);
  data(where_li + offset) = 0;
end

which I ran like this (note that it is important that timed is a function not a script to allow data to be re-used)

function timed
N = 2000;
nk = 10;

where = false([N, N]);
where(1:100, 1:100) = 1;
data = (5.+1j)*ones([N, N, nk]);
tic, data = dealiasing2d2(where,data); toc

This runs in 0.00437 seconds on my GLNXA64 machine running R2014b.

Edric
  • 23,676
  • 2
  • 38
  • 40