4

My original issue is that I am trying to do the following:

def submit_decoder_process(decoder, input_line):
    decoder.process_line(input_line)
    return decoder

self.pool = Pool(processes=num_of_processes)
self.pool.apply_async(submit_decoder_process, [decoder, input_line]).get()

decoder is a bit involved to describe here, but the important thing is that decoder is an object that is initialized with PyParsing expression that calls setParseAction(). This fails pickle that multiprocessing uses and this in turn fails the above code.

Now, here is the pickle/PyParsing problem that I have isolated and simplified. The following code yields an error message due to pickle failure.

import pickle
from pyparsing import *

def my_pa_func():
    pass

pickle.dumps(Word(nums).setParseAction(my_pa_func))

Error message:

pickle.PicklingError: Can't pickle <function wrapper at 0x00000000026534A8>: it's not found as pyparsing.wrapper

Now If you remove the call .setParseAction(my_pa_func), it will work with no problems:

pickle.dumps(Word(nums))

How can I get around it? Multiprocesing uses pickle, so I can't avoid it, I guess. The pathos package that is supposedly uses dill is not mature enough, at least, I am having problems installing it on my Windows-64bit. I am really scratching my head here.

jazzblue
  • 2,411
  • 4
  • 38
  • 63
  • ~~rant~~: `pathos` is close to 10 years old... with active development nearly the entire time (well, periodically). It trivially installs with `setuptools`. It can be installed with `pip` if you use the `pre` flag. I don't see how it's not a "mature" package… aside from the version number of the last stable release (which *was* not an issue until the version number standards changed). Anyway, sigh. The pending new release will take care of `pip` install issues *purely* due to version numbering. – Mike McKerns Mar 11 '15 at 23:33

3 Answers3

7

OK, here is the solution inspired by rocksportrocker: Python multiprocessing pickling error

The idea is to dill the object that can't be pickled while passing it back and forth between processes and then "undill" it after it has been passed:

from multiprocessing import Pool
import dill

def submit_decoder_process(decoder_dill, input_line):
    decoder = dill.loads(decoder_dill)  # undill after it was passed to a pool process
    decoder.process_line(input_line)
    return dill.dumps(decoder)  # dill before passing back to parent process

self.pool = Pool(processes=num_of_processes)

# Dill before sending to a pool process
decoder_processed = dill.loads(self.pool.apply_async(submit_decoder_process, [dill.dumps(decoder), input_line]).get())
Community
  • 1
  • 1
jazzblue
  • 2,411
  • 4
  • 38
  • 63
  • Glad you were able to work this out, with the help of the `dill` module. I've made changes to pyparsing in the past to support pickling, but I never tried to pickle a parser with parse actions - great solution, I'll see if there is a way to incorporate it into pyparsing. – PaulMcG Jan 14 '15 at 02:28
0

https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled

The multiprocessing.Pool uses the Pickle's protocol to serialize the function and module names (in your example setParseAction and pyparse) which are delivered through the Pipe to the child process.

The child process, once receives them, it imports the module and try to call the function. The problem is that what you're passing is not a function but a method. To resolve it, the Pickle protocol should be clever enough to build 'Word' object with the 'user' parameter and then call the setParseAction method. As handling these cases is too complicated, the Pickle protocol prevents you to serialize non top level functions.

To solve your issue either you instruct the Pickle's module on how to serialize the setParseAction method (https://docs.python.org/2/library/pickle.html#pickle-protocol) or you refactor your code in a way that what's passed to the Pool.apply_async is serializable.

What if you pass the Word object to the child process and you let it call the Word().setParseAction()?

noxdafox
  • 14,439
  • 4
  • 33
  • 45
0

I'd suggest pathos.multiprocessing, as you mention. Of course, I'm the pathos author, so I guess that's not a surprise. It appears that there might be a distutils bug that you are running into, as referenced here: https://github.com/uqfoundation/pathos/issues/49.

Your solution using dill is a good workaround. You also might be able to forgo installing the entire pathos package, and just install the pathos fork of the multiprocessing package (which uses dill instead of pickle). You can find it here: http://dev.danse.us/packages or here: https://github.com/uqfoundation/pathos/tree/master/external,

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • Hi Mike, that was me who opened that issue on github. Actually, first I was trying to use pathos, but since it did not install for me for various reasons, I had to come up with some kind of workaround. In any case, I look forward to using pathos when, hopefully, all its issues are resolved. – jazzblue Jan 14 '15 at 04:37
  • Yep, I caught that it was you. I added the above for anyone else who runs into the same python bug you've run up against, so they'd also have a link to your ticket. Anyway, no good active code has all its issues resolved. :) But I am actively working to make pathos easier to install. – Mike McKerns Jan 14 '15 at 14:09