1

Can somebody help me to understand the following code snippet ? I know that I must not use global variables using multiprocessing but I'm still suprised about the result I see.

I use a global file handle within the function executed remotely in a different worker process.

import multiprocessing
import os

fh = open("out.txt", "w")

def process(i):
    print("i={}  pid={}  id(fh)={}".format(i, os.getpid(), id(fh)))
    print(i, file=fh)

def main():
    p = multiprocessing.Pool(3)
    p.map(process, (1, 2, 3))
    p.terminate()
    fh.close()

main()

The output is

i=1  pid=92045  id(fh)=4314964256
i=2  pid=92046  id(fh)=4314964256
i=3  pid=92047  id(fh)=4314964256

So we see that there are three different process ids as expected.

What suprises me:

  1. the unpickable file handle is available in the worker processes
  2. the memory address computed by id is the same for all workers
  3. the worker process can write to this file handle without throwing an exception
  4. nevertheless the file is empty after programm execution.
rocksportrocker
  • 7,251
  • 2
  • 31
  • 48

1 Answers1

1

Found the answers myself:

  1. the unpickable file handle is available in the worker processes: multiprocessing forks from the Python interpreter process and global variables thus are copied within memory (SO post multiprocessing global variable memory copying). Pickling of variables only happens for the function arguments when calling functions in sub interpreters.
  2. the memory address computed by id is the same for all workers: the shown memory address is the address within the virtual address space thus relative. After forking the initial memory layout is the same. (SO post Fork - same memory addresses?)
  3. the worker process can write to this file handle without throwing an exception: see answer 1 + for the forked processes the underlying fileno of fh is the same.
  4. nevertheless the file is empty after programm execution: the file is not empty if I call the flush method before returning from process. So the file buffer is not flushed when the process is killed by it's parent.

Remark: My example should behave differently on Windows. multiprocessing on Windows has to start new Python interpreters because of the different process model / implementation.

rocksportrocker
  • 7,251
  • 2
  • 31
  • 48