processing continuous output of a command in python

Question

I'm brand new to python, having used perl for years. A typical thing I do all the time is perl is open a command as a pipe and assign its output to a local variable for processing. In other words:

"open CMD, "$command|";
$output=<CMD>;

a piece of cake. I think I can do something similar in python this way:

args=[command, args...]
process=subprocess.Popen(args, stdout=subprocess.PIPE)
output=process.communicate()

so far so good. Now for the big question...

If I fire off that command using an ssh on multiple platforms, I can then monitor the descriptors in perl inside a select loop to process the results as they come in. I did find the python select and poll modules but am not quite sure how to use them. The documentation says poll would take a file handle, but when I try to pass the variable 'process' above to poll.register() I get an error that it must be an int or have a fileno() method. Since Popen() used stdout, I tried calling

poll.register(process.stdout)

and it no longer throws an error, but instead just hangs.

Any suggestions/pointers of how to make something like this work?

process.stdout would be the file handle for that process object - I'm not sure if that's all you need to get it to work with poll/select, I haven't used those. — AdamKG, Jan 12 '12 at 21:22
Also, note that Popen.communicate() blocks until EOF - you'll probably want to get rid of that. — AdamKG, Jan 12 '12 at 21:23
ahh, I get it. clearly DON'T want to block. lots of new constructs to try and wrap my brain around. -mark — Mark J Seger, Jan 12 '12 at 22:16
so I just tried it again using the poll, which the documentation says scales better than select(), and it seems to be working the same was as martin's example below. It's not blocking! I'd show you my code but this website won't let me answer my own question for 8 hours and I can't try to format this response with blank lines because return='post the comment'. ;( — Mark J Seger, Jan 12 '12 at 22:27
@MarkJSeger: this is not a forum, so you don't post answers as "followups". The idea is you post your question and we post answers. You can edit your question, though, to show additional info about your questions, such as the code you've tried. Hope this helps a bit, otherwise look in the [faq]. — Martin Geisler, Jan 13 '12 at 12:28
if you need to read output from a single process (with line-oriented output) then you don't need `select`, see [Python: read streaming input from `subprocess.communicate()`](http://stackoverflow.com/q/2715847/4279) — jfs, Aug 11 '15 at 21:28

Martin Geisler · Answer 1 · 2015-08-14T07:47:10.643

8

Using select.poll: You need to pass objects with a fileno method or real file descriptors (integers):

import os, sys, select, subprocess

args = ['sh', '-c', 'while true; do date; sleep 2; done']
p1 = subprocess.Popen(args, stdout=subprocess.PIPE)
p2 = subprocess.Popen(args, stdout=subprocess.PIPE)

while True:
    rlist, wlist, xlist = select.select([p1.stdout, p2.stdout], [], [])
    for stdout in rlist:
        sys.stdout.write(os.read(stdout.fileno(), 1024))

You'll see it pause every two seconds and then produce more output as it comes available. The "trick" is that p1.stdout is a normal file-like object with a fileno method that returns the file descriptor number. This is all that is needed by select.

Note that I'm reading from stdout using os.read instead of simply calling stdout.read. This is because a call like stdout.read(1024) would make your program wait until the requested number of bytes have been read. Fewer bytes are only returned when EOF is reached, but since EOF is never reached, the stdout.read call will block until at least 1024 bytes have been read.

This is unlike the os.read function, which has no qualms about returning early when fewer bytes are available — it returns straight away with what's available. In other words, getting less than 1024 bytes back from os.read(stdout.fileno(), 1024) is not a sign that stdout has been closed.

Using select.epoll is almost identical, except that you get a "raw" file descriptor (FD) back that you need os.read to be able to read from:

import os, sys, select, subprocess

args = ['sh', '-c', 'while true; do date; sleep 2; done']
p1 = subprocess.Popen(args, stdout=subprocess.PIPE)
p2 = subprocess.Popen(args, stdout=subprocess.PIPE)

poll = select.poll()
poll.register(p1.stdout)
poll.register(p2.stdout)

while True:
    rlist = poll.poll()
    for fd, event in rlist:
        sys.stdout.write(os.read(fd, 1024))

A closed FD is signaled by the select.POLLHUP event being returned. You can then call the unregister method and finally break out of the loop when all FDs are closed.

Finally, let me note that you could of course make a dictionary with a mapping from file descriptors back to the file-like objects, and hence back to the processes you launched.

edited Aug 14 '15 at 07:47

answered Jan 12 '12 at 21:39

Martin Geisler

72,968
25
171
229

slick, but I don't see it waiting anywhere. for the record, the command I'm running is "collectl", a monitoring tool I wrote in perl – Mark J Seger Jan 12 '12 at 21:58
this website is a realpain! I wanted to supply a, better formatted response and it won't let me answer my own question. sheesh... so now I have to write a hard to parse response. your solution works BUT the select isn't sleeping. Rather, it continuously wakes up and the read below returns 'None' which I then have to ignore. I thin if you try your example with something like a ps command that generate a lot more output you'll see what I mean -mark – Mark J Seger Jan 12 '12 at 22:11
1

I've added some print statements you can enable. When I do that here, I see that it's waiting as it should for two seconds, and then it reads some bytes from each of the ready file descriptors. I just changed the answer to read *up to* 1024 bytes and it seems to work fine here, meaning that the `os.read` call returns every time the FD runs out of bytes. – Martin Geisler Jan 12 '12 at 22:18
ahhhhhh! now it's all coming back to me. since your earlier read wasn't emptying the buffer all at once, of course there was still data waiting to be read and so the select kept waking up immediately. I've changed things to do the bigger reads like you showed and it now works just fine. Thank you so much! Now if you want to try out a slick monitoring tool, check out collectl. ;) – Mark J Seger Jan 12 '12 at 22:34
You mean this one: http://collectl.sourceforge.net/ Looks nice :-) I'm glad you got it working in the end. I just updated the answer again with more about the read sizes and blocking of the FDs. Hope it helps! – Martin Geisler Jan 12 '12 at 22:38
@MarkJSeger, Instead of answering your question to provide more information it's better to edit your original with the added info. (also, remember to accept the answer by clicking the check mark) – Wayne Werner Jan 12 '12 at 22:53
re collectl - it's all written in perl! One of my reason for and newer questions I'll post as other notes is I'd like to be able to start writing new utilities in python and am trying to get some of the basics down. The colmux utility is a collectl multiplexor. What it can do is start copies of collectls on hundreds of nodes, which talk back to it over a socket. It then sorts the output and like top can show top utilization of almost any counter on the system! Want to see top users of slab memory? How about NFS commits? The list is almost endless. It's in the collectl-utils package. – Mark J Seger Jan 13 '12 at 12:22
@MarkJSeger: you should be able to [post your own answer](http://stackoverflow.com/help/self-answer) – jfs Aug 11 '15 at 21:26
@J.F.Sebastian: that's a great point. I probably started with `stdout.read` for simplicity and found it to wait too long, which made me realize that I could read smaller chunks. I'll update the answer to use `os.read` and add a note about the `stdout.read` behavior. Thanks. – Martin Geisler Aug 14 '15 at 07:35

score 2 · Answer 2 · edited Aug 11 '15 at 18:23

2

import subprocess

p = subprocess.Popen('apt-get autoclean', stdout=subprocess.PIPE, stderr = None, shell=True)

for line in iter(p.stdout.readline, ''):

    print line

p.stdout.flush()
p.stdout.close()

print ("Done")

edited Aug 11 '15 at 18:23

josliber

43,891
12
98
133

answered Aug 11 '15 at 18:04

vishal

2,258
1
18
27

don't use `shell=True` unless necessary. Your code will hang if somebody changes `print line` to `print(line)` and runs it on Python 3. `print line` doubles all newlines. Use `bufsize=1` to improve performance. `.flush()` is unnecessary here. You could use `with`-statement to close the pipe. Call `p.wait()` to avoid zombies. See [this answer (every character (even a comma) is for a reason there)](http://stackoverflow.com/a/17698359/4279) – jfs Aug 11 '15 at 21:33

processing continuous output of a command in python

2 Answers2