Python multiprocessing--global variables in separate processes sharing id?

Question

From this question I learned that:

When you use multiprocessing to open a second process, an entirely new instance of Python, with its own global state, is created. That global state is not shared, so changes made by child processes to global variables will be invisible to the parent process.

To verify this behavior, I made a test script:

import time
import multiprocessing as mp
from multiprocessing import Pool
x = [0]  # global
def worker(c):
    if c == 1:  # wait for proc 2 to finish; is global x overwritten by now?
        time.sleep(2)
    print('enter: x =', x, 'with id', id(x), 'in proc', mp.current_process())
    x[0] = c
    print('exit: x =', x, 'with id', id(x), 'in proc', mp.current_process())
    return x[0]

pool = Pool(processes=2)
x_vals = pool.map(worker, [1, 2])
print('parent: x =', x, 'with id', id(x), 'in proc', mp.current_process())
print('final output', x_vals)

The output (on CPython) is something like

enter: x = [0] with id 140138406834504 in proc <ForkProcess(ForkPoolWorker-2, started daemon)>
exit: x = [2] with id 140138406834504 in proc <ForkProcess(ForkPoolWorker-2, started daemon)>
enter: x = [0] with id 140138406834504 in proc <ForkProcess(ForkPoolWorker-1, started daemon)>
exit: x = [1] with id 140138406834504 in proc <ForkProcess(ForkPoolWorker-1, started daemon)>
parent: x = [0] with id 140138406834504 in proc <_MainProcess(MainProcess, started)>
final output [1, 2]

How should I explain the fact that the id of x is shared in all the processes, yet x takes different values? Isn't id conceptually the memory address of a Python object? I guess this is possible if the memory space gets cloned in the child processes. Then is there something I can use to get the actual physical memory address of a Python object?

score 5 · Answer 1 · edited Jun 20 '20 at 09:12

Shared State

When you use multiprocessing to open a second process, an entirely new instance of Python, with its own global state, is created. That global state is not shared, so changes made by child processes to global variables will be invisible to the parent process.

The crucial point here seems to be:

That global state is not shared..."

...refering to that global state of the child process. But that doesn't mean that part of the global state from the parent can't be shared with the child process as long the child process doesn't attempt to write to this part. When this happens, this part get's copied and changed and will not be visible to the parent.

Background:

On Unix 'fork' is the default way for starting the child process:

The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.

Available on Unix only. The default on Unix.

Fork is implemented using copy-on-write, so unless you assign a new object to x no copying takes place and the child process shares the same list with its parent.

Memory address

How should I explain the fact that the id of x is shared in all the processes, yet x takes different values?

Fork creates a child process in which the virtual address space is identical to the virtual address space of the parent. The virtual addresses will all map to the same physical addresses until copy-on-write occurs.

Modern OSes use virtual addressing. Basically the address values (pointers) you see inside your program are not actual physical memory locations, but pointers to an index table (virtual addresses) that in turn contains pointers to the actual physical memory locations. Because of this indirection, you can have the same virtual address point to different physical addresses IF the virtual addresses belong to index tables of separate processes. link

Then is there something I can use to get the actual physical memory address of a Python object?

There doesn't seem to be a way to get the actual physical memory address (link). id returns the virtual (logical) memory address (CPython). The actual translation from virtual to physical memory address falls to the MMU.

Python multiprocessing--global variables in separate processes sharing id?

1 Answers1

Linked