4

Currently, I have something like this:

self.process = subprocess.Popen(self.cmd, stdout=subprocess.PIPE)
out, err = self.process.communicate()

The command I'm running streams the output, and I need the process to block before continuing.

How do I make it so that I can capture the streaming output AND have the streaming output printing through stdout? When I set stdout=subprocess.PIPE, I can capture the output, but it won't print the output. If I leave out stdout=subprocess.PIPE, it prints the output, but communicate() will return None.

Is there a solution that would do what I'm asking for WHILE providing blocking until the process is terminated/completed AND avoid buffer issues and pipe deadlock issues mentioned here?

Thanks!

jaka
  • 155
  • 1
  • 2
  • 10

2 Answers2

5

I can think of a few solutions.

#1: You can just go into the source to grab the code for communicate, copy and paste it, adding in code that prints each line as it comes in as well as buffering things up. (If its possible for your own stdout to block because of, say, a deadlocked parent, you can use a threading.Queue or something instead.) This is obviously a bit hacky, but it's pretty easy, and will be safe.

But really, communicate is complicated because it needs to be fully general, and handle cases you don't. All you need here is the central trick: throw threads at the problem. A dedicated reader thread that doesn't do anything slow or blocking between read calls is all you need.

Something like this:

self.process = subprocess.Popen(self.cmd, stdout=subprocess.PIPE)
lines = []
def reader():
    for line in self.process.stdout:
        lines.append(line)
        sys.stdout.write(line)
t = threading.Thread(target=reader)
t.start()
self.process.wait()
t.join()

You may need some error handling in the reader thread. And I'm not 100% sure you can safely use readline here. But this will either work, or be close.

#2: Or you can create a wrapper class that takes a file object and tees to stdout/stderr every time anyone reads from it. Then create the pipes manually, and pass in wrapped pipes, instead of using the automagic PIPE. This has the exact same issues as #1 (meaning either no issues, or you need to use a Queue or something if sys.stdout.write can block).

Something like this:

class TeeReader(object):
    def __init__(self, input_file, tee_file):
        self.input_file = input_file
        self.tee_file = tee_file
    def read(self, size=-1):
        ret = self.input_file.read(size)
        if ret:
            self.tee_file.write(ret)
        return ret

In other words, it wraps a file object (or something that acts like one), and acts like a file object. (When you use PIPE, process.stdout is a real file object on Unix, but may just be something that acts like on on Windows.) Any other methods you need to delegate to input_file can probably be delegated directly, without any extra wrapping. Either try this and see what methods communicate gets AttributeExceptions looking for and code those those explicitly, or do the usual __getattr__ trick to delegate everything. PS, if you're worried about this "file object" idea meaning disk storage, read Everything is a file at Wikipedia.

#3: Finally, you can grab one of the "async subprocess" modules on PyPI or included in twisted or other async frameworks and use that. (This makes it possible to avoid the deadlock problems, but it's not guaranteed—you still have to make sure to services the pipes properly.)

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • how exactly would you implement #2? does a 'file' object indicate that I must write to disk (I'd prefer not to)? – jaka Mar 27 '13 at 08:32
  • @jaka: No, a pipe is already a file object. You're creating an object that wraps a file object, and delegates all file methods to it (except for `read`, which does a little extra around the delegated call). – abarnert Mar 27 '13 at 08:36
  • I'll try to add at least some skeleton code to the first two answers. – abarnert Mar 27 '13 at 08:38
  • I think threading is what I need. Thanks! Some skeleton code would be awesome as well! Once again, thanks! – jaka Mar 27 '13 at 08:39
  • Thanks for the code. Just a quick question -- if I replace `self.process.wait()` with `out, err = self.process.communicate()`, why does out have no output? Does the reader thread actually affect communicate()? – jaka Mar 27 '13 at 09:06
  • sorry to bother you again -- do you have any idea why reader doesn't run until the program is the process returns? – jaka Mar 27 '13 at 09:39
  • apparently `for line in self.process.stdout:` only gets run after the process returns. seems like `readline()` is the key – jaka Mar 27 '13 at 10:42
  • That surprises me a little bit. `for line in f:` should be the same as `while True:` `line = f.readline()` `if not line: break` for any reasonable file-like object, including the `process.stdout` pipe. But if for some reason it isn't, and you've got a workaround, I guess it isn't critical. – abarnert Mar 27 '13 at 20:19
  • Can someone post a working example of solution #2 (or whichever is the easiest)? Something like : import subprocess process = subprocess.Popen("/bin/bash", shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE); process.stdin.write("export MyVar=\"Test\"\n") process.stdin.write("echo $MyVar\n") process.stdin.flush() stdout, stderr = process.communicate() print "stdout: " + str(stdout) # Do it again process.stdin.write("echo $MyVar\n") process.stdin.flush() stdout, stderr = process.communicate() print "stdout: " + str(stdout) – David Doria Apr 17 '14 at 18:55
2

The output goes to your calling process, essentially capturing the stdout from self.cmd, so that output does not go anywhere else.

What you need to do is to print it from the 'parent' process if you want to see the output.

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • 1
    I'm pretty sure the whole point is that he wants to print it as it comes in—but his parent process is blocked until `communicate` returns, and therefore he can't print it from the parent process. – abarnert Mar 27 '13 at 08:10