I want to parallelize execution of a for loop on the quadcore processor of my computer's CPU. I am using pp (Python-Parallel) - rather than joblib.Parallel for reasons considered here.
But I am getting an error:
Traceback (most recent call last):
File "batching.py", line 60, in cleave_out_bad_data
job1 = job_server.submit(cleave_out, (data_dir,dirlist,), (endswithdat,))
File "/homes/ad6813/.local/lib/python2.7/site-packages/pp.py", line 459, in submit
sfunc = self.__dumpsfunc((func, ) + depfuncs, modules)
File "/homes/ad6813/.local/lib/python2.7/site-packages/pp.py", line 637, in __dumpsfunc
sources = [self.__get_source(func) for func in funcs]
File "/homes/ad6813/.local/lib/python2.7/site-packages/pp.py", line 704, in __get_source
sourcelines = inspect.getsourcelines(func)[0]
File "/usr/lib/python2.7/inspect.py", line 690, in getsourcelines
lines, lnum = findsource(object)
File "/usr/lib/python2.7/inspect.py", line 529, in findsource
raise IOError('source code not available')
IOError: source code not available
It looks like the reason is a python-2.7 bug.
Has anyone come across this and solved it?
Here is my code:
def clean_dir(data_dir, dirlist):
job_server = pp.Server()
job1 = job_server.submit(clean, (data_dir,dirlist,), (endswith,))
def clean(data_dir, dirlist):
[good_or_bad(file, data_dir) for file in dirlist if endswith(file)]