I am trying to use the subprocess
module in Python to communicate with a process that reads standard input and writes standard output in a streaming fashion. I want to have the subprocess read lines from an iterator that produces the input, and then read output lines from the subprocess. There may not be a one-to-one correspondence between input and output lines. How can I feed a subprocess from an arbitrary iterator that returns strings?
Here is some example code that gives a simple test case, and some methods I have tried that don't work for some reason or other:
#!/usr/bin/python
from subprocess import *
# A really big iterator
input_iterator = ("hello %s\n" % x for x in xrange(100000000))
# I thought that stdin could be any iterable, but it actually wants a
# filehandle, so this fails with an error.
subproc = Popen("cat", stdin=input_iterator, stdout=PIPE)
# This works, but it first sends *all* the input at once, then returns
# *all* the output as a string, rather than giving me an iterator over
# the output. This uses up all my memory, because the input is several
# hundred million lines.
subproc = Popen("cat", stdin=PIPE, stdout=PIPE)
output, error = subproc.communicate("".join(input_iterator))
output_lines = output.split("\n")
So how can I have my subprocess read from an iterator line by line while I read from its stdout line by line?