I have many many small tasks to do in a for
loop. I Want to use concurrency to speed it up. I used joblib for its easy to integrate. However, I found using joblib makes my program run much slower than a simple for
iteration. Here is the demo code:
import time
import random
from os import path
import tempfile
import numpy as np
import gc
from joblib import Parallel, delayed, load, dump
def func(a, i):
'''a simple task for demonstration'''
a[i] = random.random()
def memmap(a):
'''use memory mapping to prevent memory allocation for each worker'''
tmp_dir = tempfile.mkdtemp()
mmap_fn = path.join(tmp_dir, 'a.mmap')
print 'mmap file:', mmap_fn
_ = dump(a, mmap_fn) # dump
a_mmap = load(mmap_fn, 'r+') # load
del a
gc.collect()
return a_mmap
if __name__ == '__main__':
N = 10000
a = np.zeros(N)
# memory mapping
a = memmap(a)
# parfor
t0 = time.time()
Parallel(n_jobs=4)(delayed(func)(a, i) for i in xrange(N))
t1 = time.time()-t0
# for
t0 = time.time()
[func(a, i) for i in xrange(N)]
t2 = time.time()-t0
# joblib time vs for time
print t1, t2
On my laptop with i5-2520M CPU, 4 cores, Win7 64bit, the running time is 6.464s for joblib and 0.004s for simplely for
loop.
I've made the arguments as memory mapping to prevent the overhead of reallocation for each worker. I've red this relative post, still not solved my problem. Why is that happen? Did I missed some disciplines to correctly use joblib?