2

I am trying to make use of my computer's multiple CPUs. However, the BeautifulSoup object returned by my function as part of an SQLAlchemy object is not picklable with pickle or cPickle so I am using pathos, a fork of the multiprocssing package that uses dill such that it can pickle any python object. I tested dill on the object that I could not pickle and it worked, so I thought my problem would be solved. However, when I use pathos' pool.map I have the same problem that I did before, mainly that the function completes but the result is not returned. I confirmed this by using results = pool.amap(myfunc, myarglist) which completes, but results.get() which does not. Unfortunately, I cannot post the html for the page (it is not publicly available), and I have been unable to find a reproducible example of the problem. This answer includes a function for troubleshooting multiprocessing of large objects, but unfortunately it uses Queue which does not seem to be implemented for pathos by itself (only presumably under the hood within the pool.map function). I am using the 0.2a1.dev version of pathos (with dependencies installed with pip prior to compiling from source) on python 2.7. Here is the traceback for the keyboard interrupt:

Process PoolWorker-2:
Process PoolWorker-7:
Traceback (most recent call last):
Process PoolWorker-8:Process PoolWorker-6:Process PoolWorker-3:Process PoolWorker-5:Process PoolWorker-4:Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 227, in _bootstrap



Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 227, in _bootstrap
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 227, in _bootstrap
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 227, in _bootstrap
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 227, in _bootstrap
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 227, in _bootstrap
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 85, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.py", line 59, in worker
    self.run()
    self.run()
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 85, in run
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 85, in run
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 85, in run
    self._target(*self._args, **self._kwargs)
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.py", line 54, in worker
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.py", line 54, in worker
    self._target(*self._args, **self._kwargs)
    self.run()
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.py", line 54, in worker
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 85, in run
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 85, in run
    put((job, i, result))
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/queue.py", line 339, in put
    self._target(*self._args, **self._kwargs)
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.py", line 54, in worker
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.py", line 54, in worker
    for job, i, func, args, kwds in iter(inqueue.get, None):
    for job, i, func, args, kwds in iter(inqueue.get, None):
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/queue.py", line 325, in get
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/queue.py", line 325, in get
    wacquire()
KeyboardInterrupt
    for job, i, func, args, kwds in iter(inqueue.get, None):
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/queue.py", line 325, in get
    racquire()
    racquire()
    for job, i, func, args, kwds in iter(inqueue.get, None):
    for job, i, func, args, kwds in iter(inqueue.get, None):
KeyboardInterrupt
KeyboardInterrupt
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/queue.py", line 325, in get
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/queue.py", line 325, in get
    racquire()
KeyboardInterrupt
    racquire()
    racquire()
KeyboardInterrupt
KeyboardInterrupt

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 227, in _bootstrap
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/process.py", line 85, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/pool.py", line 54, in worker
    for job, i, func, args, kwds in iter(inqueue.get, None):
  File "/usr/local/lib/python2.7/dist-packages/processing-0.52_pathos-py2.7-linux-x86_64.egg/processing/queue.py", line 327, in get
    return recv()
KeyboardInterrupt
Community
  • 1
  • 1
Michael
  • 13,244
  • 23
  • 67
  • 115
  • I'd suggest you update to the most recent pathos from github, I'm unsure if that will help you or not. Also are you using `pathos.multiprocessing.Pool` or `ProcessingPool`? `Pool` uses `dill` instead of `pickle`, but doesn't have the rest of the augments that `ProcessingPool` has. If your function is calling `Queue` as you have indicated elsewhere, you may be out of luck. You could possibly use shared memory in `multiprocessing` with `ctypes`. I don't know, hard to say without seeing your code. There is an option in `dill` that provides compression, but it's turned off at the moment... – Mike McKerns Jul 08 '14 at 00:09
  • 0.2a1.dev is not the most recent version? I installed from github source this morning. My function does not call queue, only multiprocessing. I was using `pathos.multiprocessing.ProcessingPool` which is used in the pathos documentation. Doesn't that not use `dill`? At any rate I just tried `pathos.multiprocessing.Pool` and got the same result. – Michael Jul 08 '14 at 00:16
  • Whoops. I didn't see your version info in the question. Sorry. Yes, that uses `dill`, both do. You are using it as intended, it seems. Sorry for my confusion. Looks the size of the pickle causes an issue, as your trace says. `dill` and `pathos` have some compression options that I could try, given an example. There's also shared memory as I mentioned. – Mike McKerns Jul 08 '14 at 00:20
  • Where is that indicated in the trace? Other than shared memory, is there a workaround within the `pathos` multiprocessing package to pickle large objects differently? The biggest issue for me is that because the script simply hangs I cannot figure out how to catch this as an error so my script crashes. – Michael Jul 08 '14 at 00:28
  • I just tested dill on one of the object that crashes and it turns out it does not work. When I call `dill.dumps(myobject)` it hangs. – Michael Jul 08 '14 at 00:42
  • I have compression turned off in `dill`, but it is exposed in another package. It's hard to tell if it's compression or size, or what without seeing a sample. Would it be possible to post or send the code? – Mike McKerns Jul 08 '14 at 01:33
  • I still have never seen `dill` just hang on a `dump`. There are several methods to try in `dill.detect` that give you information on what is happening. If you still can't post a reduced example of your code for whatever reason, you could at least try some of the `dill.detect` methods, and maybe find some clue to what the error is. – Mike McKerns Oct 30 '14 at 21:41
  • I fixed it by getting rid of the attributes of the object that could not be pickled using cPickle and just using the main multiprocessing package. Sorry I cannot help you reproduce the error to debug the package, but if you've never seen an issue it cannot be affecting that many people. http://stackoverflow.com/a/24664473/2327821 – Michael Oct 31 '14 at 18:31
  • If you got rid of the unpicklable attributes of your object ahead of time, then there is no need for `dill` or `pathos.multiprocessing`… `cPickle` would simply work. Sorry I couldn't be of more help. – Mike McKerns Oct 31 '14 at 20:05

0 Answers0