multi core processing in for loop using numpy and fft

Question

I calculated vector using numpy and fft. I used numpy broadcast method and for loop. speed of Two methods is similar. How can I calculate vector using multicore and numpy and fft?

import numpy as np
from numpy.fft import fft, ifft

num_row, num_col = 6000, 13572

ss = np.ones((num_row, num_col), dtype=np.complex128)
sig = np.random.standard_normal(num_col) * 1j * np.random.standard_normal(num_col)

# for loop    
for idx in range(num_row):
    ss[idx, :] = ifft(fft(ss[idx, :]) * sig)

# broadcast
ss = ifft(fft(ss, axis=1) * sig, axis=1)

Result

loop : 10.798867464065552 sec
broadcast : 11.298897981643677 sec

You can specify an axis to `fft` rather than using a loop `np.fft.fft(a, axis=1)` — Brenlla, Apr 18 '18 at 15:39
You can also use [broadcasting](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html) for your array multiplication `ss*sig` — Brenlla, Apr 18 '18 at 15:46
What is this even doing? You're computing the FFT of a vector of ones (which is just `[N, 0, 0, 0...]`, where N is the length of the vector), multiplying by some random data, and then IFFT??? I think there's an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) here? — Ahmed Fasih, Apr 18 '18 at 16:05
Anyway, default Numpy FFT isn't multi-threaded: you either have to use something like Enthought's Python distribution (where Numpy is built against Intel MKL, which has a heavily optimized FFT) or use PyFFTW (which leans on FFTW's multithreading). Or do you want to try and work around the GIL and use Python's built-in threading capabilities? — Ahmed Fasih, Apr 18 '18 at 16:06
Also you probably meant to do `stdn(ncol) + 1j * stdn(ncol)` (note the `+`). — Nils Werner, Apr 18 '18 at 16:07
The code is the only example. My project has many codes like example. So I want to speed up this pattern code. — LowQualityDelivery, Apr 18 '18 at 16:12
Use PyFFTW or IntelMKL, you couldn't do better since the fft and ifft takes 99.9% of runtime. Intel implementation https://github.com/IntelPython/mkl_fft — max9111, Apr 18 '18 at 16:33

score 1 · Answer 1 · answered Apr 18 '18 at 15:46

1

You can use the axis parameter for fft and ifft, as well as broadcasting:

ss = np.ones((num_row, num_col), dtype=np.complex128)
sig = np.random.standard_normal(num_col) * 1j * np.random.standard_normal(num_col)

ss = ifft(fft(ss, axis=1) * sig, axis=1)

answered Apr 18 '18 at 15:46

Nils Werner

34,832
7
76
98

ifft(fft(ss, axis=1) * sig, axis=1) is slower than for loop. – LowQualityDelivery Apr 18 '18 at 15:56

score 1 · Accepted Answer · answered Nov 18 '19 at 14:11

I compared broadcast, threadpool and loop. In this case, ThreadPool had the best performance.

# %% Import
# Standard library imports
import time
from multiprocessing.pool import ThreadPool

# Third party imports
from numpy import zeros, complex128, allclose
from numpy.fft import fft, ifft
from numpy.random import standard_normal


# %% Generate data
n_row, n_col = 6000, 13572

ss = standard_normal((n_row, n_col)) + 1j * standard_normal((n_row, n_col))
sig = standard_normal(n_col) + 1j * standard_normal(n_col)
ss_loop = zeros((n_row, n_col), dtype=complex128)
ss_thread = zeros((n_row, n_col), dtype=complex128)

# %% Loop processing
start_time = time.time()
for idx in range(n_row):
    ss_loop[idx, :] = ifft(fft(ss[idx, :]) * sig)
print(f'loop elapsed time : {time.time() - start_time}')

# %% Broadcast processing
start_time = time.time()
ss_broad = ifft(fft(ss, axis=1) * sig, axis=1)
print(f'broadcast elapsed time : {time.time() - start_time}')


# %% ThreadPool processing
def filtering(idx_thread):
    ss_thread[idx_thread, :] = ifft(fft(ss[idx_thread, :]) * sig)


start_time = time.time()
pool = ThreadPool()
pool.map(filtering, range(n_row))
print(f'ThreadPool elapsed time : {time.time() - start_time}')


# %% Verify result
if allclose(ss_thread, ss_broad, rtol=1.e-8):
    print('ThreadPool Correct')

if allclose(ss_loop, ss_broad, rtol=1.e-8):
    print('Loop Correct')

Result

loop elapsed time : 5.102990627288818
broadcast elapsed time : 4.520442008972168
ThreadPool elapsed time : 1.6988463401794434

multi core processing in for loop using numpy and fft

2 Answers2