1

In the code below, how do I make the Starter object be able to read gen.vals? It seems like a different object gets created, whose state gets updated, but Starter never knows about it. Also, how would the solution apply for self.vals being a dictionary, or any other kind of object?

import multiprocessing
import time

class Generator(multiprocessing.Process):
    def __init__(self):
        self.vals = []
        super(Generator, self).__init__()

    def run(self):
        i = 0
        while True:
            time.sleep(1)
            self.vals.append(i)
            print 'In Generator ', self.vals # prints growing list
            i += 1

class Starter():
    def do_stuff(self):
        gen = Generator()
        gen.start()
        while True:
            print 'In Starter ', gen.vals # prints empty list
            time.sleep(1)

if __name__ == '__main__':
    starter = Starter()
    starter.do_stuff()

Output:

In Starter  []
In Starter  []
In Generator  [0]
In Starter  []
In Generator  [0, 1]
In Starter  []
In Generator  [0, 1, 2]
In Starter  []
In Generator  [0, 1, 2, 3]
In Starter  []
In Generator  [0, 1, 2, 3, 4]
In Starter  []
In Generator  [0, 1, 2, 3, 4, 5]
In Starter  []
In Generator  [0, 1, 2, 3, 4, 5, 6]
In Starter  []
In Generator  [0, 1, 2, 3, 4, 5, 6, 7]
martineau
  • 119,623
  • 25
  • 170
  • 301
Baron Yugovich
  • 3,843
  • 12
  • 48
  • 76

1 Answers1

1

When you start a process it essentially executes in a whole separate context (here's a brief explanation on what's going on) so there is no shared memory to speak of, therefore whatever your run() method does doesn't really reflect in your main process - Python spawns/forks a whole new process out of it, instantiates your Generator there and calls its run() method and any changes to the state of that other instance in a different process stay there.

If you want to pass data around, you need to use some multiprocessing aware structures that will essentially serialize/deserialize data between different processes and communicate the changes back and forward. For example:

import multiprocessing
import time

class Generator(multiprocessing.Process):
    def __init__(self):
        self._vals = []  # keeps the internal state
        self.vals = multiprocessing.Queue()  # a queue for the exchange
        super(Generator, self).__init__()

    def run(self):
        i = 0
        while True:
            time.sleep(1)
            self._vals.append(i)  # update the internal state
            print('In Generator ', self._vals) # prints growing list
            self.vals.put(self._vals)  # add it to the queue
            i += 1

class Starter():
    def do_stuff(self):
        gen = Generator()
        gen.start()
        while True:
            print('In Starter ', gen.vals.get()) # print what's in the queue
            time.sleep(1)

if __name__ == '__main__':
    starter = Starter()
    starter.do_stuff()

Will print out:

In Generator  [0]
In Starter  [0]
In Generator  [0, 1]
In Starter  [0, 1]
In Generator  [0, 1, 2]
In Starter  [0, 1, 2]
In Generator  [0, 1, 2, 3]
In Starter  [0, 1, 2, 3]
etc.

If you want to do more complex/semi-concurrent data modifications or deal with more structured data, check the structures supported by multiprocessing.Manager. Of course, for very complex stuff I'd always recommend using an in-memory database like Redis as a means of inter-process data exchange. Or, if you prefer to do micro-management yourself, ØMQ is always a good option.

zwer
  • 24,943
  • 3
  • 48
  • 66
  • Or in other words, why is your solution not good for more structured data? – Baron Yugovich Jun 18 '17 at 03:16
  • @BaronYugovich - it will work for any structure, as long as it can be pickled as that's how Python's multiprocessing-aware structures communicate. – zwer Jun 18 '17 at 03:16
  • So why do you say "If you want to do more complex/semi-concurrent data modifications or deal with more structured data, ...", for which cases is this solution a problem? – Baron Yugovich Jun 18 '17 at 03:17
  • @BaronYugovich - it's not good for complex data because sometimes it cannot be pickled, and in other instances you just want to communicate the changes, not to constantly pickle/unpickle your whole objects. `multiprocessing.Manager` has some optimizations when it communicates changes instead of blindly pickling/unpickling objects, but for really large and complex data you'll be much better off relying on something designed for complex inter-process communication from the ground up. In other words, try passing 100M element list this way and you'll see what I'm talking about. – zwer Jun 18 '17 at 03:19