I would like to compute a set of ffts in parallel using numpy.fft.fft
and multiprocessing
. Unfortunately, running the ffts in parallel results in a large kernel load.
Here is a minimal example that reproduces the problem:
# fft_test.py
import numpy as np
import multiprocessing
from argparse import ArgumentParser
def f(i):
x = np.empty(1000000)
np.fft.fft(x)
return i
def __main__():
ap = ArgumentParser('fft_test')
ap.add_argument('--single_core', '-s', action='store_true', help='use only a single core')
args = ap.parse_args()
# Show the configuration
print("number of cores: %d" % multiprocessing.cpu_count())
np.__config__.show()
# Execute using a single core
if args.single_core:
for i in range(multiprocessing.cpu_count()):
f(i)
print(i, end=' ')
# Execute using all cores
else:
pool = multiprocessing.Pool()
for i in pool.map(f, range(multiprocessing.cpu_count())):
print(i, end=' ')
if __name__ == '__main__':
__main__()
Running time python fft_test.py
gives me the following results:
number of cores: 48
openblas_info:
library_dirs = ['/home/till/anaconda2/envs/sonalytic/lib']
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
openblas_lapack_info:
library_dirs = ['/home/till/anaconda2/envs/sonalytic/lib']
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
blas_opt_info:
library_dirs = ['/home/till/anaconda2/envs/sonalytic/lib']
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
blas_mkl_info:
NOT AVAILABLE
lapack_opt_info:
library_dirs = ['/home/till/anaconda2/envs/sonalytic/lib']
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
real 0m7.422s
user 0m9.830s
sys 1m26.603s
Running with a single core, i.e. python fft_test.py -s
gives
real 1m0.345s
user 0m56.558s
sys 0m2.959s
Any idea what might cause the large kernel wait?