1

I am using Parallel function from joblib package in Python. I would like to use this function only for handle one of my functions but unfortunately the whole code is running in parallel (apart from other functions).

Example:

from joblib import Parallel, delayed
print ('I do not want this to be printed n times')
def do_something(arg):
    some calculations(arg)

Parallel(n_jobs=5)(delayed(do_something)(i) for i in range(0, n))
max04
  • 5,315
  • 3
  • 13
  • 21

1 Answers1

3

This is a common error to miss a design direction from documentation. Many users meet this very same piece of experience.

Documentation is quite clear about not placing any code but def-s before a __main__ fuse.

If not doing so, errors indeed spray out and things turn wreck havoc, but still, an explicit advice to re-read the documentation is still present there, leaking infinitely over the screen:

[joblib] Attempting to do parallel computing
without protecting your import on a system that does not support forking.

To use parallel-computing in a script, you must protect your main loop
using "if __name__ == '__main__'".

Please see the joblib documentation on Parallel for more information

Solution:

Having properly done the first issue, reported w.r.t. the fused import protection, things will get better:

C:\Python27.anaconda>python joblib_example.py
I do not want this to be printed n-times...
I do not want this to be printed n-times...
I do not want this to be printed n-times...
I do not want this to be printed n-times...
I do not want this to be printed n-times...
I do not want this to be printed n-times...

next a final touch and you are done:

from sklearn.externals.joblib  import Parallel, delayed

def do_some_thing( arg ):
    pass
    return True

if  __name__ == '__main__': #################################### A __main__ FUSE:

    pass;                                   n = 6
    print "I do not want this to be printed n-times..."

    Parallel( n_jobs = 5 ) ( delayed( do_some_thing )( i )
                                                   for i in range( 0, n )
                             )

C:\Python27.anaconda>python joblib_example.py
I do not want this to be printed n-times...

C:\Python27.anaconda>
user3666197
  • 1
  • 6
  • 50
  • 92
  • Thank you! I did know how this function works - just wanted to know some workaround and your simple approach is pretty good, just smiled I did not think about it :) One thing here, I have a lot of local/global variables defined at the beginning so I will have to put them as well in the main part. I am thinking about some other package to use as an alternative. Any ideas? Dask package maybe? – max04 Feb 26 '18 at 21:51
  • And one thing, I forgot to add "if __name__ == '__main__'" in my question but I have it in my code - without this, Parallel function does not work. – max04 Feb 26 '18 at 22:01