14

The docs (python 3.4) explain that with spawn, "the child process will only inherit those resources necessary to run the process object's run() method".

But which objects are "necessary"? The way I read it suggested to me that all the objects that can be reached from inside run() are "necessary", including arguments passed as args to Process.__init__, plus whatever is stored in global variables, as well as classes, functions defined in global scope and their attributes. However, this is incorrect; the following code confirms that the objects stored in global variables aren't inherited:

# running under python 3.4 / Windows
# but behaves the same under Unix
import multiprocessing as mp

x = 0
class A:
    y = 0

def f():
    print(x) # 0
    print(A.y) # 0

def g(x, A):
    print(x) # 1
    print(A.y) # 0; really, not even args are inherited?

def main():
    global x
    x = 1
    A.y = 1
    p = mp.Process(target = f)
    p.start()
    q = mp.Process(target = g, args = (x, A))
    q.start()


if __name__=="__main__":
    mp.set_start_method('spawn')
    main()

Is there a clear rule that states which objects are inherited?

EDIT:

To confirm: running this on Ubuntu produces the same output. (Thanks to @mata for clarifying that I forgot add global x to main(). This omission made my example confusing; it would also affect the result if I were to switch 'spawn' to 'fork' under Ubuntu. I now added global x to the code above.)

dano
  • 91,354
  • 19
  • 222
  • 219
max
  • 49,282
  • 56
  • 208
  • 355

1 Answers1

9

This has to do with the way classes are pickled when being sent to the spawned Process. The pickled version of a class doesn't really contain its internal state, but only the module and the name of the class:

class A:
   y = 0

pickle.dumps(A)
# b'\x80\x03c__main__\nA\nq\x00.'

There is no information about y here, it's comparable to a reference to the class.

The class will be unpickled in the spawned process when passed as argumeht to g, which will import its module (here __main__) if neccessary and return a reference to the class, therefore changes made to it in your main function won't affect it as the if __name__ == "__main__" block won't be executed in the subprocess. f directly uses the class in its module, so the effect is basically the same.

The reason why x shows different values is a little different. Your f function will print the global variable x from the module. In your main() function you have another local variable x, so setting x = 1 here won't affect the module level x in neither processes. It's passed to g as argument, so in this case it will alays have the local value of 1.

mata
  • 67,110
  • 10
  • 163
  • 162
  • Thanks. So with `'spawn'`, none of the global objects are serialized and passed to the child process. What is passed, besides the arguments to the `target` function in `Process.__init__`? Everything in `locals()` as seen from within the target function, as serialized by `pickle`? And nothing from `globals()` (unless referenced by one of the `locals()` objects of course)? – max Mar 23 '15 at 00:28
  • 1
    No, only the `Process` instance and what you really pass as an argument is serialized, (target) functions are pickled similar to classes - only name/module name are preserved. The rest of the state needed to execute the spawned process is reached by importing the `__main__` module. Only for class _instances_ the internal state (`__dict__`, ...) is pickled, which also applies to the instance of `Process`. – mata Mar 23 '15 at 07:37
  • You're making a distinction between `pickle` and `serialization`. What's the difference between the two? – max Apr 02 '15 at 22:36
  • With `serialize` I ment `pickle` in this context. Serialzation just means converting an object to a different representation, pickle is just one possible implementation of a serialzation protocol. – mata Apr 02 '15 at 23:27