2

Joblib for parallel computation taking more time for njob>1 (njob=2 takes 12.6s finished) than njob=1 (1.3s finished). I am in mac OSX 10.9 with 16GB RAM. Am I doing some mistake? Here is a simple demo code:

from joblib import Parallel, delayed
def func():
    for i in range(200):
        for j in range(300):
            yield i, j 

def evaluate(x):
    i=x[0]
    j=x[1]
    p=i*j
    return p, i, j

if __name__ == '__main__':
    results = Parallel(n_jobs=3, verbose=2)(delayed(evaluate)(x) for x in func())
    res, i, j = zip(*results)
SUV
  • 53
  • 6
  • See also: http://stackoverflow.com/questions/21027477/joblib-parallel-multiple-cpus-slower-than-single comprehensive answers to this question have been given. – Gael Varoquaux Jan 10 '16 at 22:09

1 Answers1

1

Short answer: Joblib is a multiprocessing system, and has a fair amount of overhead in booting up a new python process for each of your 3 simultaneous jobs. As a result, your specific code is likely to get even slower if you add more jobs.

There's some documentation about this here.

The workarounds aren't great:

  1. accept the overhead
  2. don't use parallel code
  3. Use multithreading instead of multiprocessing.. Unfortunately, multithreading is rarely an option unless you are using a fully compiled function in place of evaluate, because python is almost always single-threaded (see the python GIL).

That said, for functions that take a long time, multiprocessing is often worth it. Depending on your application, it's really a judgment call. Note that every variable used in the function is copied to each process - variable copy is rare in python, so this can be a surprise. As a result, the overhead is in part a function of the size of the variables passed either explicitly or implicitly (eg. via use of global variables).

Alex
  • 46
  • 2