1
import multiprocessing


class multiprocessing_issue:
    def __init__(self):
        self.test_mp()

    def print_test(self):
        print "TEST TEST TEST"

    def test_mp(self):
        p = multiprocessing.Pool(processes=4)
        p.apply_async(self.print_test, args=())
        print "finished"


if __name__ == '__main__':
    multiprocessing_issue()

I've set up a simple test above, create a class, call apply_async with a function that should print "TEST TEST TEST". When I run this I see "finished" printed, but it never prints "TEST TEST TEST" as expected.

Can anyone see the error in this simple test case? I've set it up to reproduce the way I'm using it in my code.

Python 2.7 on Ubuntu

David Parks
  • 30,789
  • 47
  • 185
  • 328
  • What version of Python is this? And what OS? – roganjosh Apr 26 '17 at 16:49
  • 1
    On 2.7 and on Windows I also get "Finished" followed by a heavy crash of all the processes with `ImportError: No module named tmp0jekzi` probably due to the `pool` not being shielded by `if __name__ == '__main__'` – roganjosh Apr 26 '17 at 16:52

2 Answers2

4

Modify test_mp as follows:

def test_mp(self):
    p = multiprocessing.Pool(processes=4)
    r = p.apply_async(self.print_test, args=())
    print r.get()

and the answer will be more clear.

Traceback (most recent call last):
  File "test.py", line 18, in <module>
    multiprocessing_issue()
  File "test.py", line 6, in __init__
    self.test_mp()
  File "test.py", line 14, in test_mp
    print r.get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
    raise self._value
cPickle.PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed

Instance methods cannot be serialized that easily. What the Pickle protocol does when serialising a function is simply turning it into a string.

In [1]: dumps(function)
Out[1]: 'c__main__\nfunction\np0\n.'

For a child process would be quite hard to find the right object your instance method is referring to due to separate process address spaces.

Modules such as dill are doing a better job than Pickle. Yet I would discourage you from mixing concurrency and OOP as the logic gets confusing pretty easily.

noxdafox
  • 14,439
  • 4
  • 33
  • 45
2

Ah, it's a problem moving the class reference between processes, if I define the method at the module level instead of the class level everything works.

import multiprocessing


class multiprocessing_issue:
    def __init__(self):
        self.test_mp()

    def test_mp(self):
        p = multiprocessing.Pool(4)
        r = p.apply_async(mptest, args=())
        r.get()
        print "finished"


def mptest():
    print "TEST TEST TEST"


if __name__ == '__main__':
    multiprocessing_issue()
David Parks
  • 30,789
  • 47
  • 185
  • 328
  • 1
    Eh, my last comment didn't make sense, I missed `self.test_mp()`. I guess I'm not used to people using classes like this. It doesn't really make sense to me that you would create a class for this and then define `__init__` just to run a class method. But I see your main tag is `java` :P – roganjosh Apr 26 '17 at 17:05
  • It doesn't make so much sense, you're right, well, it did when I started using threading for parallelism, but I'm trying to deal with threading issues by switching to multiprocessing without rewriting all my code. Gotcha's within gotcha's. – David Parks Apr 26 '17 at 17:08
  • 1
    For future reference on this question this is a useful and related article: http://stackoverflow.com/questions/1816958/cant-pickle-type-instancemethod-when-using-pythons-multiprocessing-pool-ma – David Parks Apr 26 '17 at 17:09
  • I would to know why it only when you call r.get(). The function doesn't necessarily return anything. – David Okwii Jan 30 '21 at 07:40
  • There's no return statement on `mptest` so the default return value is `None`. I'm not sure I understand the question. @DavidOkwii – David Parks Jan 30 '21 at 17:49
  • Sorry, I meant that calling the apply_async() only worked when followed it up with a call to r.get(). I am using apply_async(send_emails, args) to call a function that sends emails to users, so I don't need to block or return anything. However, it doesn't work unless I call the get method like apply_async(send_emails, args).get(). I don't understand why! – David Okwii Feb 04 '21 at 18:59
  • FYI you can also define the method as a `@staticmethod` inside the class it would also work – gnikit Feb 25 '22 at 01:58