0

In shell script, we have the following command:

/script1.pl < input_file| /script2.pl > output_file

I would like to replicate the above stream in Python using the module subprocess. input_file is a large file, and I can't read the whole file at once. As such I would like to pass each line, an input_string into the pipe stream and return a string variable output_string, until the whole file has been streamed through.

The following is a first attempt:

process = subprocess.Popen(["/script1.pl | /script2.pl"], stdin = subprocess.PIPE, stdout = subprocess.PIPE, shell = True)
process.stdin.write(input_string)
output_string = process.communicate()[0]

However, using process.communicate()[0] closes the stream. I would like to keep the stream open for future streams. I have tried using process.stdout.readline(), instead, but the program hangs.

Unitarihedron
  • 203
  • 1
  • 3
  • 9
  • `/script1.pl < input_string` reads the **file** named `input_string`, it does not feed the literal string `input_string` as input. – Adam Rosenfield Dec 26 '13 at 18:38
  • Ah I see. I would like to feed an actual string to my python implementation though. I will iterate through strings using a generator, and I want to pass the generated strings through the pipe on the fly. – Unitarihedron Dec 26 '13 at 18:42
  • your shell command is not compatible with *"keep the stream open"*. What do you want to put into `output_string` (the first byte, the first line, the first n bytes, the first bytes that arrive in 10 seconds)? btw, `output_string = process.communicate(input_string)[0]` reproduces your shell command (if we use strings instead of files). – jfs Dec 26 '13 at 19:07
  • My apologies for the confusion. My shell command reads from a large file with a lot of lines, and writes to another file. I can't open and read the whole file in python. Rather, I have to read line by line, and pass each line into the pipe stream. I would like to keep the pipe stream open until all lines are passed through it. – Unitarihedron Dec 26 '13 at 19:23
  • Edited my question to clarify the problem. Thanks. – Unitarihedron Dec 26 '13 at 19:25
  • @Unitarihedron: the title of your question (send/get variables i.e., strings in memory) contradicts the body of your question (a large file that does not fit in memory). – jfs Dec 26 '13 at 19:52

1 Answers1

1

To emulate /script1.pl < input_file | /script2.pl > output_file shell command using subprocess module in Python:

#!/usr/bin/env python
from subprocess import check_call

with open('input_file', 'rb') as input_file
    with open('output_file', 'wb') as output_file:
        check_call("/script1.pl | /script2.pl", shell=True,
                   stdin=input_file, stdout=output_file)

You could write it without shell=True (though I don't see a reason here) based on 17.1.4.2. Replacing shell pipeline example from the docs:

#!/usr/bin/env python
from subprocess import Popen, PIPE

with open('input_file', 'rb') as input_file
    script1 = Popen("/script1.pl", stdin=input_file, stdout=PIPE)
with open("output_file", "wb") as output_file:
    script2 = Popen("/script2.pl", stdin=script1.stdout, stdout=output_file)
script1.stdout.close() # allow script1 to receive SIGPIPE if script2 exits
script2.wait()
script1.wait()

You could also use plumbum module to get shell-like syntax in Python:

#!/usr/bin/env python
from plumbum import local

script1, script2 = local["/script1.pl"], local["/script2.pl"]
(script1 < "input_file" | script2 > "output_file")()

See also How do I use subprocess.Popen to connect multiple processes by pipes?


If you want to read/write line by line then the answer depends on the concrete scripts that you want to run. In general it is easy to deadlock sending/receiving input/output if you are not careful e.g., due to buffering issues.

If input doesn't depend on output in your case then a reliable cross-platform approach is to use a separate thread for each stream:

#!/usr/bin/env python
from subprocess import Popen, PIPE
from threading import Thread

def pump_input(pipe):
    try:
       for i in xrange(1000000000): # generate large input
           print >>pipe, i
    finally:
       pipe.close()

p = Popen("/script1.pl | /script2.pl", shell=True, stdin=PIPE, stdout=PIPE,
          bufsize=1)
Thread(target=pump_input, args=[p.stdin]).start()
try: # read output line by line as soon as the child flushes its stdout buffer
    for line in iter(p.stdout.readline, b''):
        print line.strip()[::-1] # print reversed lines
finally:
    p.stdout.close()
    p.wait()
Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670