8

I have the following code.

def main():
  (minI, maxI, iStep, minJ, maxJ, jStep, a, b, numProcessors) = sys.argv
  for i in range(minI, maxI, iStep):
    for j in range(minJ, maxJ, jStep): 
      p = multiprocessing.Process(target=functionA, args=(minI, minJ))
      p.start()
      def functionB((a, b)):
        subprocess.call('program1 %s %s %s %s %s %s' %(c, a, b, 'file1', 
          'file2', 'file3'), shell=True)
        for d in ['a', 'b', 'c']:
          subprocess.call('program2 %s %s %s %s %s' %(d, 'file4', 'file5', 
            'file6', 'file7'), shell=True)
      abProduct = list(itertools.product(range(0, 10), range(0, 10)))
      pool = multiprocessing.Pool(processes=numProcessors)
      pool.map(functionB, abProduct) 

It produces the following error.

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.6/threading.py", line 484, in run 
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.6/multiprocessing/pool.py", line 255, in _handle_tasks
    put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function fa
iled

The contents of functionA are unimportant, and do not produce an error. The error seems to occur when I try to map functionB. How do I remove this error, and what is the best way to parallelize this code in Python 2.6?

idealistikz
  • 1,247
  • 5
  • 21
  • 35
  • 1
    Just wondering... Whats the purpose of using the multiprocessing module here when you are joining on every process you start...basically running them serially. – jdi Jul 02 '12 at 03:21
  • 1
    possible duplicate of [Can't pickle when using python's multiprocessing Pool.map()](http://stackoverflow.com/questions/1816958/cant-pickle-type-instancemethod-when-using-pythons-multiprocessing-pool-ma) – msw Jul 02 '12 at 03:42
  • `functionB` might need to be in the file-level scope, not main's scope. Try putting it there. – ldrg Jul 02 '12 at 03:42
  • Here is the answer http://stackoverflow.com/questions/1816958/cant-pickle-type-instancemethod-when-using-pythons-multiprocessing-pool-ma – Maksym Polshcha Jul 02 '12 at 03:45

1 Answers1

18

The reason you are most likely seeing this behavior is because of the order in which you define your pool, objects, and functions. multiprocessing is not quite the same as using threads. Each process will spawn and load a copy of the environment. If you create functions in scopes that may not be available to the processes, or create objects before the pool, then the pool will fail.

First, try creating one pool before your big loop:

(minI, maxI, iStep, minJ, maxJ, jStep, a, b, numProcessors) = sys.argv
pool = multiprocessing.Pool(processes=numProcessors)
for i in range(minI, maxI, iStep):
    ...

Then, move your target callable outside the dynamic loop:

def functionB(a, b):
    ...

def main():
    ...

Consider this example...

broken

import multiprocessing

def broken():
    vals = [1,2,3]

    def test(x):
        return x

    pool = multiprocessing.Pool()
    output = pool.map(test, vals)
    print output

broken()
# PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

working

import multiprocessing

def test(x):
    return x

def working():
    vals = [1,2,3]

    pool = multiprocessing.Pool()
    output = pool.map(test, vals)
    print output

working()
# [1, 2, 3]
jdi
  • 90,542
  • 19
  • 167
  • 203