I need to run an embarrassingly parallel for loop. After a quick search, I found package joblib for python. I did a simple test as posted on the package's website. Here is the test
from math import sqrt
from joblib import Parallel, delayed
import multiprocessing
%timeit [sqrt(i ** 2) for i in range(10)]
result: 3.89 µs ± 38.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
num_cores = multiprocessing.cpu_count()
%timeit Parallel(n_jobs=num_cores)(delayed(sqrt)(i ** 2) for i in range(10))
result: 600 ms ± 40 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If I understand the results correctly, using the joblib does not only increase the speed but make in it slower? Did I miss something here, Thank you