0

My understanding is that child processes in Python cannot access the STDIN of the main process (reference questions that support this argument: Python using STDIN in child Process and Is there any way to pass 'stdin' as an argument to another process in python?).

However in the following piece of code I am able to send the STDIN to the process pool using map. Can someone please clarify how this is different?

import multiprocessing 
import fileinput

def test(line):
    print line

p = multiprocessing.Pool()
p.map(test, fileinput.input())
Community
  • 1
  • 1
KT100
  • 1,381
  • 5
  • 17
  • 27

1 Answers1

3

Pool.map will process the input list (or other iterable) in the main process in order to hand each process one* member of the list at a time. So your example is equivalent to the following:

import multiprocessing 
import fileinput

def test(line):
    print line

input = []
for line in fileinput.input():
    input.append(line)

p = multiprocessing.Pool()
p.map(test, input)

In which it is true that the child process doesn't read anything from stdin.

* unless you specify a chunksize in which case it hands each process a bunch of list members at a time.


That being said it is not true that the child process cannot access stdin. If that were true in general then for example UNIX shells would not be of much use. In reality child processes inherit the file descriptors of their parents. Consequently parents and children can all read from the same input source. The problem though is that a piece of input data can be only read once so the problem is not one of access to stdin from the children but of deciding which process gets to read what data. In many cases this is difficult and therefore unreliable (for example if you are reading data via a buffer, e.g. via many programming languages' standard library subroutines).

It is, I suppose, for the above reason that the multiprocessing module's authors decided to close sys.stdin (e.g. the standard library object via which you can read stdin) in child processes and force you to give target functions their input data in a safer way (e.g. via multiprocessing.Queue). But there is a workaround, provided you know exactly how your child processes will access stdin, which will work for any file you have opened in the parent process as well:

import os, sys, multiprocessing

def square(num):
    if num == 3:
         num = int(raw_input('square what? ')) 
    return num ** 2

def initialize(fd):
    sys.stdin = os.fdopen(fd)

initargs = [sys.stdin.fileno()]
pool = multiprocessing.Pool(5, initialize, initargs)

So for example if we send the numbers from 1 to 10 to the pool each of the five processes will receive a number, one at a time, but the process that gets the number 3 will prompt for input:

>>> pool.map(square, range(10)))
square what? 9
[0, 1, 4, 81, 16, 25, 36, 49, 64, 81]

Just be careful not to have multiple child processes reading from the same descriptor at the same time or things may get... confusing.

kouk
  • 1,453
  • 12
  • 12
  • As reference for the `multiprocessing` module's authors decision reading the [source code](http://hg.python.org/cpython/file/6d12285e250b/Lib/multiprocessing/process.py#l237) can be illuminating. – kouk Oct 17 '13 at 07:00