1

I am trying to rewrite the MATLAB code below into Python, and found that my Python code (2.7 sec) is slower than MATLAB (1.2 sec). I have tried many different ways, including the module numba, but no luck yet. How can I make the Python code faster?

MATLAB code:

szA=[1024,1280]; HfszA=[512,640];
[aPx,aPy]=meshgrid(-HfszA(2):HfszA(2)-1,-HfszA(1):HfszA(1)-1);
img=randi(255,1024,1280);
fx=rand(); fy=rand();
tic
for i=1:20
    F=abs(sum(sum(img.*exp(-1i*2*pi*(fx*aPx+fy*aPy)))));
end
toc

Python code:

import numpy as np
import time

szA=[1024,1280]; HfszA=[512,640]
aPx,aPy=np.meshgrid(np.arange(-HfszA[1],HfszA[1]),np.arange(-HfszA[0],HfszA[0]))
img=np.array(np.random.randint(256,size=(1024,1280)))

fx=np.random.rand()
fy=np.random.rand()

start = time.time()
for i in range(20):
    F=abs(np.sum(img*np.exp(-1j*2*np.pi*(fx*aPx+fy*aPy))))
end = time.time()
print("Elapsed (after compilation) = %s" % (end - start))
print(F)
Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
S. Lee
  • 33
  • 4
  • Please see: [Why is “Can someone help me?” not an actual question?](http://meta.stackoverflow.com/q/284236) – Adriaan Sep 26 '18 at 15:31
  • Also see [Why is MATLAB so fast in matrix multiplication?](https://stackoverflow.com/q/6058139/5211833) – Adriaan Sep 26 '18 at 15:32
  • You loop over variable `i`, but you don't use that variable within the loop. You simply compute `F` 20 times, overwriting the previous result. Is this loop only meant to time how long the computation takes? – Cris Luengo Sep 26 '18 at 16:35
  • I asked the above question because (1) your question can use some clarification for what is going on, and (2) you should be using `timeit` in MATLAB, and `timeit.timeit` in Python, for timing. – Cris Luengo Sep 26 '18 at 16:42

2 Answers2

0

I believe you can even try a simpler problem

MATLAB

function test
    szA=[1024,1280]; HfszA=[512,640];
    [aPx,aPy]=meshgrid(-HfszA(2):HfszA(2)-1,-HfszA(1):HfszA(1)-1);
    fx=1.0; fy=2.0;
    tic
    for i=1:2000
        F=sum(sum(fx*aPx+fy*aPy));
    end
    toc
    disp(F)

outputs

Elapsed time is 9.566274 seconds.

-1966080

Python

import numpy as np
import time

szA=[1024,1280]; HfszA=[512,640]
aPx,aPy=np.meshgrid(np.arange(-HfszA[1],HfszA[1]),np.arange(-HfszA[0],HfszA[0]))

fx=1.0
fy=2.0

start = time.time()
for i in range(2000):
    F = np.sum(np.sum(fx*aPx+fy*aPy, axis=0))
end = time.time()
print("Elapsed (after compilation) = %s" % (end - start))
print(F)

outputs

Elapsed (after compilation) = 33.3840000629

-1966080.0

I believe the difference between the two is two fold:

  • Matlab uses OpenMP while doing many array operations
  • takes advantage of AVX instruction set.

The reason I have come to this conclusion is looking at CPU usage on my Core i3 with four cores (you could check the number of cores on your CPU), with the python script it's at 30% with matlab it's at 100%.

As for the AVX instruction set it's just that I have once compared MATLAB matmul operation with Eigen's one (http://eigen.tuxfamily.org/index.php?title=Main_Page), and to match performance I had to compile Eigen with -openmp and -axAVX.

To finally answer your question I don't think you can make the Python code faster unless you could compile numpy underlying lib with openmp, AVX directive.

Here is the tutorial https://docs.scipy.org/doc/scipy/reference/building/linux.html

Good luck, let us know how it went.

PilouPili
  • 2,601
  • 2
  • 17
  • 31
0

Please always post what you have tried so far. Regarding your Numba version, I think you have done something which leads to bad performance.

Example

import numpy as np
import numba as nb
import time

@nb.njit(fastmath=True)
def your_function(fx,fy,aPx,aPy,img):
  pi=np.pi
  sum=0.
  for i in range(aPx.shape[0]):
    for j in range(aPx.shape[1]):
      sum+=img[i,j]*np.exp(-1j*2*pi*(fx*aPx[i,j]+fy*aPy[i,j]))
  return np.abs(sum)

@nb.njit(fastmath=True,parallel=True)
def your_function_p(fx,fy,aPx,aPy,img):
  pi=np.pi
  sum=0.
  for i in nb.prange(aPx.shape[0]):
    for j in range(aPx.shape[1]):
      sum+=img[i,j]*np.exp(-1j*2*pi*(fx*aPx[i,j]+fy*aPy[i,j]))
  return np.abs(sum)

#The function gets compiled at the first call
#you may also use cache=True, which only works in single threaded code
F=your_function(fx,fy,aPx,aPy,img)
start = time.time()
for i in range(20):
    F=your_function(fx,fy,aPx,aPy,img)

end = time.time()
print("Elapsed (after compilation) = %s" % (end - start))
print(F)

F=your_function_p(fx,fy,aPx,aPy,img)
start = time.time()
for i in range(20):
    F=your_function_p(fx,fy,aPx,aPy,img)

end = time.time()
print("Elapsed (after compilation) = %s" % (end - start))
print(F)

Timings (4C/8T)

your_version: 2.45s
Numba single threaded: 0.17s
Numba parallel: 0.07s
max9111
  • 6,272
  • 1
  • 16
  • 33
  • Thank you so much, max9111. Your answer not only helps me a lot on my work, but also corrects my possible wrong idea on Python language itself. Thank you again. – S. Lee Sep 27 '18 at 08:21