3

In Python and Matlab, I wrote codes that generate a matrix and populates it with a function of indices. Execution time of Python code is about 20 times longer than the execution time of Matlab code. Two functions with same results are written in python, the bWay() is based on this answer

Here's the full Python code:

import numpy as np
from timeit import timeit

height = 1080
width = 1920
heightCm = 30
distanceCm = 70

centerY = height / 2 - 0.5;
centerX = width / 2 - 0.5;

constPart = height * heightCm / distanceCm

def aWay():
    M = np.empty([height, width], dtype=np.float64);
    for y in xrange(height):
        for x in xrange(width):
            M[y, x] = np.arctan(pow((pow((centerX - x), 2) + pow((centerY - y), 2)), 0.5) / constPart)

def bWay():
    M = np.frompyfunc(
        lambda y, x: np.arctan(pow((pow((centerX - x), 2) + pow((centerY - y), 2)), 0.5) / constPart), 2, 1## Heading ##
    ).outer(
        np.arange(height),
        np.arange(width),
    ).astype(np.float64)

and here's the full Matlab code:

height = 1080;
width = 1920;
heightCm = 30;
distanceCm = 70;

centerY = height / 2 + 0.5;
centerX = width / 2 + 0.5;

constPart = height * heightCm / distanceCm;
M = zeros(height, width);
for y = 1 : height
    for x = 1 : width
        M(y, x) = atan(((centerX - x)^2 + (centerY - y)^2)^0.5 / constPart);
    end
end

Python execution time measured with timeit.timeit:

aWay() - 6.34s
bWay() - 6.68s

Matlab execution time measured with tic toc:

0.373s

To narrow it down I measured arctan, squaring and looping times

Python:

>>> timeit('arctan(3)','from numpy import arctan', number = 1000000)
1.3365135641797679
>>> timeit('pow(3, 2)', number = 1000000)
0.11460829719908361
>>> timeit('power(3, 2)','from numpy import power', number = 1000000)
1.5427879383046275
>>> timeit('for x in xrange(10000000): pass', number = 1)
0.18364813832704385

Matlab:

tic
for i = 1 : 1000000
    atan(3);
end
toc
Elapsed time is 0.179802 seconds.
tic
for i = 1 : 1000000
    3^2;
end
toc
Elapsed time is 0.044160 seconds.
tic
for x = 1:10000000
end
toc
Elapsed time is 0.034853 seconds.

In all 3 cases, Python code execution time was multiple times longer.

Is there anything I could do to improve this python code performance?

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • 5
    Shouldn't you vectorize both MATLAB and Python/NumPy codes for performance? – Divakar May 31 '17 at 20:52
  • 1
    Comparing the performance for suboptimal (or bad) solutions isn't really interesting and/or useful. Please try to optimize the performance of each solution first and then compare the performance :) – MSeifert May 31 '17 at 20:54
  • Thanks, I'll look into it and see how the times compare then – vbajcinovci May 31 '17 at 20:58
  • MATLAB does various forms of just-in-time compiling. That allows you to express problems with loops, and not pay an interpretation penalty. In older MATLAB versions your iterative MATLAB code would have been slow, and very un-MATLAB like. – hpaulj May 31 '17 at 21:36

2 Answers2

6

I'm focussing only on the Python part and how you could optimize it (never used MATLAB, sorry).

If I understand your code correctly you could use:

def fastway():
    x, y = np.ogrid[:width, :height]  # you may need to swap "x" and "y" here.
    return np.arctan(np.hypot(centerX-x, centerY-y) / constPart)

That's vectorized and should be amazingly fast.

%timeit fastway()
# 289 ms ± 9.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit aWay()
# 28.2 s ± 243 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit bWay()
# 29.3 s ± 790 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In case you're wondering: np.hypot(x, y) is identical to (x**2 + y**2)**0.5. It's not necessarily faster but shorter and in some edge cases gives more precise results.

Also if you ever need to operate on scalars you shouldn't use NumPy functions. NumPy functions have such an high overhead that the time it takes to process one element is identical to the time to process one thousand elements, see for example my answer on the question "Performance in different vectorization method in numpy".

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • @ViliamsBajčinovci You're welcome :) I wasn't sure if I had `width` and `height` in the correct order. You may need to double check that - and if they are swapped you could use `y, x = np.ogrid[:width, :height]` – MSeifert May 31 '17 at 21:13
4

To make MSeifert's answer complete, here is the vectorized Matlab code:

height = 1080;
width = 1920;
heightCm = 30;
distanceCm = 70;

centerY = height / 2 + 0.5;
centerX = width / 2 + 0.5;

constPart = height * heightCm / distanceCm;
[x, y] = meshgrid(1:width, 1:height);
M = atan(hypot(centerX-x, centerY-y) / constPart);

On my machine, this takes 0.057 seconds, while the double for loops takes 0.20 seconds.

On the same machine, MSeifert's python solution takes 0.082 seconds.

Xiangrui Li
  • 2,386
  • 2
  • 13
  • 17