Python multiprocessing subclass initialization

Question

Is it okay to initialize the state of a multiprocessing.Process subclass in the __init__() method? Or will this result in duplicate resource utilization when the process forks? Take this example:

from multiprocessing import Process, Pipe
import time

class MyProcess(Process):
    def __init__(self, conn, bar):
        super().__init__()
        self.conn = conn
        self.bar = bar
        self.databuffer = []

    def foo(self, baz):
        return self.bar * baz

    def run(self):
        '''Process mainloop'''
        running = True
        i = 0
        while running:
            self.databuffer.append(self.foo(i))
            if self.conn.poll():
                m = self.conn.recv()
                if m=='get':
                    self.conn.send((i, self.databuffer))
                elif m=='stop':
                    running = False
            i += 1
            time.sleep(0.1)


if __name__=='__main__':
    conn, child_conn = Pipe()
    p = MyProcess(child_conn, 5)
    p.start()      
    time.sleep(2)

    # Touching the instance does not affect the process which has forked.
    p.bar=1
    print(p.databuffer)

    time.sleep(2)
    conn.send('get')
    i,data = conn.recv()
    print(i,data)
    conn.send('stop')
    p.join()

As I note in the code, you cannot communicate with the process via the instance p, only via the Pipe so if I do a bunch of setup in the __init__ method such as create file handles, how is this duplicated when the process forks?

Does this mean that subclassing multiprocessing.Process in the same way you would a threading.Thread a bad idea?

Note that my processes are long running and meant to handle blocking IO.

Wayne Werner · Answer 1 · 2016-07-18T21:41:11.197

2

This is easy to test. In __init__, add the following:

 self.file = open('does_it_open.txt'.format(self.count), 'w')

Then run:

 $ strace -f python youprogram.py 2> test.log
 $ grep does_it_open test.log
 open("does_it_open.txt", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 6

That means that at least on my system, copying your code and adding that call, the file was opened once, and only once.

For more about the wizardry that is strace, check out this fantastic blog post.

edited Jul 18 '16 at 21:41

answered Jul 18 '16 at 21:29

Wayne Werner

49,299
29
200
290

I think that `strace` is only looking at the main process. When I move the `open()` call to the `run()` method it does not show up in `strace`. – Mike Jul 18 '16 at 21:37
Good catch. Add the `-f` flag to [include forked processes](http://stackoverflow.com/a/6314744/344286). – Wayne Werner Jul 18 '16 at 21:41
Appending `os.getpid()` to the filename does confirm that the file is only opened up once. This makes sense since the file handle can be passed to the new process without trouble. I don't believe that the processes share any memory though, so is any memory I allocate in the constructor duplicated? – Mike Jul 18 '16 at 21:45
You don't allocate memory. You might create new objects, but you don't allocate memory. If you look at https://docs.python.org/3/library/multiprocessing.html#the-process-class you'll see that the whole point of the `if __name__ == '__main__'` block is to prevent that code from running in forked processes - either via threading or multiprocessing. – Wayne Werner Jul 18 '16 at 21:49
It looks like process is forked by default (on Linux at least) so the `__name__` check should not come into play. If I were to allocate memory in the parent process prior to calling `p.start()` then that data will simply be memory mapped to the fork until it is written. From my understanding of fork, there will be a copy of the data structure when I try to write to it. Not sure if the copy is partial or total. – Mike Jul 18 '16 at 22:09
You don't _instantiate_ your class outside your `__name__ == '__main__'`. Defining a class doesn't take (much) memory, and certainly `__init__` is never called. – Wayne Werner Jul 19 '16 at 11:42

Python multiprocessing subclass initialization

1 Answers1