4

I am passing "sys.stdout" as an argument to a process, and the process then writes to the "sys.stdout" while it does its stuff.

import multiprocessing
import sys
def worker_with(stream):
    stream.write('In the process\n')

if __name__ == '__main__':
    sys.stdout.write('In the main\n')
    lock = multiprocessing.Lock()

    w = multiprocessing.Process(target=worker_with, args=(sys.stdout,))

    w.start()
    w.join()

The code above does not work, it returns the following error: "ValueError: operation on closed file".

I tried running the same code but calling the function directly instead of spawning a process and it works, it prints out to the console. I also tried running the same code but calling directly sys.stdout inside the function, spawn it as a process and it works. The problem seems to be passing sys.stout as a parameter of the process.

Does someone haveany idea why ?

Note: this code is inspired by the tutorial PYMOTW - communication between processes.

EDIT : i am running Python 2.7.10, 32 bits on Windows7.

user3722440
  • 243
  • 1
  • 6
  • 14
  • 3
    I don't get any error, I'm using Python 2.7.11 on Ubuntu 14.04. – agold Dec 16 '15 at 14:44
  • Thanks agold. Let's see if someone can run the same code on a Windows machine. – user3722440 Dec 16 '15 at 14:51
  • 1
    *How* are you passing stdout to the script in Windows? – Bob Dylan Dec 16 '15 at 14:56
  • Hi Bob Dylan. Sorry I am not sure I fully undertsnad the question. I am not passing anything, I just run the code above as a script (so by default, sys.stdout will output to the console). Does it answer the question ? – user3722440 Dec 16 '15 at 15:07

1 Answers1

6

When you pass arguments to a Process, they are pickled in the parent, transmitted to the child, and unpickled there. Unfortunately, it looks like the round trip through pickle silently misbehaves for file objects; with protocol 0, it errors out, but with protocol 2 (the highest Python 2 protocol, and the one used for multiprocessing), it silently produces a junk file object:

>>> import pickle, sys
>>> pickle.loads(pickle.dumps(sys.stdout, pickle.HIGHEST_PROTOCOL))
<closed file '<uninitialized file>', mode '<uninitialized file>' at 0xDEADBEEF>

Same problem occurs for named files too; it's not unique to the standard handles. Basically, pickle can't round trip a file object; even when it claims to succeed, the result is garbage.

Generally, multiprocessing isn't really expected to handle a scenario like this; usually, Processes are worker tasks, and I/O is performed through the main process (because if they all wrote independently to the same file handle, you'd have issues with interleaved writes).

In Python 3.5 at least, they fixed this so the error is immediate and obvious (the file-like objects returned by open, TextIOWrapper and Buffered*, will error out when pickled with any protocol).

The best you could do on Windows would be to send the known file descriptor as an argument:

sys.stdout.flush()  # Precaution to minimize output interleaving
w = multiprocessing.Process(target=worker_with, args=(sys.stdout.fileno(),))

then reopen it on the other side using os.fdopen. For fds not part of the standard handles (0, 1 and 2), since Windows uses the "spawn" method of making new Processes, you'd need to make sure any such fd was opened as a consequence of importing the __main__ module when __name__ != "__main__" (Windows simulates a fork by importing the __main__ module, setting the __name__ to something else). Of course, if it's a named file, not a standard handle, you could just pass the name and reopen that. For example, to make this work, you'd change:

def worker_with(stream):
    stream.write('In the process\n')

to:

import os

def worker_with(toopen):
    opener = open if isinstance(toopen, basestring) else os.fdopen
    with opener(toopen, 'a') as stream:
        stream.write('In the process\n')

Note: As written, if the fd is for one of the standard handles, os.fdopen will close the underlying file descriptor when the with statement exits, which may not be what you want. If you need file descriptors to survive the close of the with block, when passed a file descriptor, you may want to use os.dup to duplicate the handle before calling os.fdopen, so the two handles are independent of one another.

Other solutions would include writing results back to the main process over a multiprocessing.Pipe (so the main process is responsible for passing the data along to sys.stdout, possibly launching a thread to perform this work asynchronously), or using higher level constructs (e.g. multiprocessing.Pool().*map*) that return data using return statement instead of explicit file I/O.

If you're really desperate to make this work in general for all file descriptors (and don't care about portability), not just the standard handles and descriptors created on import of __main__, you can use the undocumented Windows utility function multiprocessing.forking.duplicate that is used to explicitly duplicate a file descriptor from one process to another; it would be incredibly hacky (you'd need to look at the rest of the Windows definition of multiprocessing.forking.Popen there to see how it would be used), but it would at least allow passing along arbitrary file descriptors, not just statically opened ones.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • ShadowRanger, thanks so much for the very detailed explanation. I think I need to go learn more about file descriptors, but the workaround you suggest is quite nice. – user3722440 Dec 16 '15 at 15:52
  • Just one thing, your first line of code (pickle round-trip) produce a different result on my machine: "". If I am not specifying the protocol, I get: "TypeError: can't pickle _TextIOBase objects". – user3722440 Dec 16 '15 at 15:54
  • 1
    @user3722440: IDLE replaces the standard handles with special ones that it can use to output to the IDLE terminal, so differing behavior would be expected; normal programs not run under IDLE wouldn't see that. The IDLE `PseudoOutputFile` inherits (indirectly) from `io.TextIOBase`; `PseudoOutputFile` doesn't forbid pickling itself, but `io.TextIOBase` does, thus the error. IDLE is weird, and the behaviors specific to it don't really help understand how the standard handles work outside of IDLE. If you run from a plain Python interpreter, it should mimic my results more closely. – ShadowRanger Dec 16 '15 at 16:01
  • got you, thanks (I got the exact same results running from the shell) – user3722440 Dec 16 '15 at 16:21