2

I have a data structure L (it could be a list, a dict, ...) and I need multiple processes to read from it. I don't want to use a multiprocessing.Manager because it's slow.

Now if L is never modified, the internet told me it won't be fully copied by the child processes thanks to copy-on-write. But what if L is referenced by object a, which itself is modified? Does copy-on-write still apply? Example:

from multiprocessing import Pool
from a import A

READONLYLIST = list(range(pow(10, 6)))  # list will never be modified
a = A(READONLYLIST)  # object a will be modified

def worker(x):
    return a.worker(x)

print(Pool(2).map(worker, range(10)))

With module a as:

import random

class A(object):
    def __init__(self, readonlylist):
        self.readonlylist = readonlylist
        self.v = 0

    def worker(self, x):
        self.v = random.random()  # modify the object
        return x + self.readonlylist[-1]

Will READONLYLIST be fully copied by the child processes in this case?

Community
  • 1
  • 1
usual me
  • 8,338
  • 10
  • 52
  • 95
  • I've not tested it, but I'd guess that doing *anything* with a copy-on-write object in Python will cause it to be copied, since the reference count will change. – Blckknght Jun 22 '16 at 15:11
  • @Blckknght: anything except a simple lookup, right? – usual me Jun 22 '16 at 15:13

1 Answers1

3

Python multiprocessing does not share memory between processes and passes objects (including the called function) between processes by pickling them (representing the object as a string). So when you call a function within a pool, the main process must pickle the function, pass the pickled representation of the function to each subprocess, and then each subprocess must depickle the function to put the function into its own separate memory.

Michael
  • 13,244
  • 23
  • 67
  • 115