5

I have a init method which initializes various primitive and complex data types and objects. In each process spawned by multiprocessing.Process, I'm printing a variable from init() method and an address of an initialized object. I get different instances of the variable but the address of object remains the same. So, want to know what exactly happens to members of parent class during multiprocessing.Process call?

def __init__(self):
    self.count = 0
    self.db = pymongo.MongoClient()

def consumerManager(self):
    for i in range(4):
        p = multiprocessing.Process(target = self.consumer, args = (i,))


def consumer(self, i):
    while(1):
        time.sleep(i)
        self.count += 1
        print self.count
        print os.getpid()
        print id(self.db)

If it is doing a deep copy of objects, then id(self.db) should be printing a different id within each process, which doesn't happen. How does this done?

vks
  • 67,027
  • 10
  • 91
  • 124
sreeraag
  • 513
  • 1
  • 5
  • 19
  • This is for linux – sreeraag Nov 29 '16 at 10:21
  • I had a dict {'a':'b'} initialized in init() and then was printing the id after modifying the dict in each process, all the processes still have the same id value though they hold data specific to that process – sreeraag Nov 29 '16 at 10:28
  • Have you tried using `Pool` instead? – Eduard Nov 29 '16 at 10:30
  • Nope, but the behaviour should be the same in both, right? – sreeraag Nov 29 '16 at 10:34
  • I did encounter this problem before using `Pool`. Dropped `multiprocessing` after reading several threads. I've forgotten why, but it may have something to do with shallow copies. – Eduard Nov 29 '16 at 10:39
  • r u using copy_reg.....self.methods cannot be pickled – vks Nov 29 '16 at 10:47
  • 1
    http://stackoverflow.com/questions/33662292/why-do-new-objects-in-multiprocessing-have-the-same-id – vks Nov 29 '16 at 11:01
  • Oh, now I get it.. They are same addresses wrt that process' virtual address space but are actually mapped to a different physical address space. So, it might not be a copy on write but an upfront deep-copy coz if it was copy on write, then a first modification of the object should have changed the address – sreeraag Nov 29 '16 at 11:14
  • i am not sure but in linux its copy-on-write...so may be you can modify your question add details about this instead – vks Nov 29 '16 at 11:19

1 Answers1

3

Generally on Linux when a new process is created, a copy of the parent is generated.

At the beginning the two processes will be in the same state but with different address spaces.

To save time, Linux shares the memory of the parent with the child until both do not modify it. This is usually referred as Copy On Write.

As the two processes keep executing, their state will diverge. If you want them to share information you can use different mechanisms: Pipes, Shared memory, Managers and Queues.

Usually, due to their simplicity Pipes and Queues are the recommended ones.

The reason you see the same id is explained in the following question. As the new process has the same memory layout of the parent, in CPython the id will be the same.

Community
  • 1
  • 1
noxdafox
  • 14,439
  • 4
  • 33
  • 45