6

I'm having some trouble understanding the behavior of select.select. Please consider the following Python program:

def str_to_hex(s):
    def dig(n):
        if n > 9:
            return chr(65-10+n)
        else:
            return chr(48+n)
    r = ''
    while len(s) > 0:
        c = s[0]
        s = s[1:]
        a = ord(c) / 16
        b = ord(c) % 16
        r = r + dig(a) + dig(b)
    return r

while True:
    ans,_,_ = select.select([sys.stdin],[],[])
    print ans
    s = ans[0].read(1)
    if len(s) == 0: break
    print str_to_hex(s)

I have saved this to a file "test.py". If invoke it as follows:

echo 'hello' | ./test.py

then I get the expected behavior: select never blocks and all of the data is printed; the program then terminates.

But if I run the program interactively, I get a most undesirable behavior. Please consider the following console session:

$ ./test.py
hello
[<open file '<stdin>', mode 'r' at 0xb742f020>]
68

The program then hangs there; select.select is now blocking again. It is not until I provide more input or close the input stream that the next character (and all of the rest of them) are printed, even though there are already characters waiting! Can anyone explain this behavior to me? I am seeing something similar in a stream tunneling program I have written and it's wrecking the entire affair.

Thanks for reading!

slowdog
  • 6,076
  • 2
  • 27
  • 30
tvynr
  • 143
  • 4
  • Off-topic, but `def str_to_hex(s): return ''.join(('%02x' % ord(c) for c in s))` ;-) – slowdog May 15 '11 at 23:01
  • @slowdog: How about `import binascii; binascii.hexlify(s)` instead? Writing your own hex conversion function is silly when an extremely fast one already exists. – Omnifarious May 15 '11 at 23:26
  • @Omnifarious: Oh, cool! I still underestimate the amount of "batteries included". – slowdog May 15 '11 at 23:35

2 Answers2

9

The read method of sys.stdin works at a higher level of abstraction than select. When you do ans[0].read(1), python actually reads a larger number of bytes from the operating system and buffers them internally. select is not aware of this extra buffering; It only sees that everything has been read, and so will block until either an EOF or more input arrives. You can observe this behaviour by running something like strace -e read,select python yourprogram.py.

One solution is to replace ans[0].read(1) with os.read(ans[0].fileno(), 1). os.read is a lower level interface without any buffering between it and the operating system, so it's a better match for select.

Alternatively, running python with the -u commandline option also seems to disable the extra buffering.

slowdog
  • 6,076
  • 2
  • 27
  • 30
  • And this _is_ the answer to the OPs problem. – Omnifarious May 15 '11 at 22:55
  • Though, another way to handle the Python buffering issue to to simply do a no-parameter read, which should read everything currently available. – Omnifarious May 15 '11 at 22:57
  • There are several ways to disable Python's buffering, outlined here: http://stackoverflow.com/questions/107705/python-output-buffering. It's generally not necessary to disable buffering, and I wouldn't expect it to be preferable in this case (I don't see how It'd be beneficial for simple stream processing). Great answer though. – Zach Kelling May 16 '11 at 00:48
  • Thanks! This explains a lot. The no-parameter reading wasn't an option; in the actual use-case, I'm launching a subprocess over SSH and the two scripts are using the subprocess's stdin and stdout to communicate. – tvynr May 17 '11 at 04:11
  • Thanks, this answer is very helpful for a problem I found very perplexing! slowdog, unfortunately, unbuffered input doesn't work with "paste". @Omnifarious, "no-paramater read" is not so simple. I managed to work out the details and posted them here: http://stackoverflow.com/questions/27750135/need-character-by-charter-keyboard-input-that-interacts-well-with-paste-and-ansi – Nat Kuhn Jan 04 '15 at 03:17
1

It's waiting for you to signal EOF (you can do this with Ctrl+D when used interactively). You can use sys.stdin.isatty() to check if the script is being run interactively, and handle it accordingly, using say raw_input instead. I also doubt you need to use select.select at all, why not just use sys.stdin.read?

if sys.stdin.isatty():
    while True:
        for s in raw_input():
            print str_to_hex(s)
else:
    while True:
        for s in sys.stdin.read(1):
            print str_to_hex(s)

Which would make it appropriate for both interactive use, and for stream processing.

Zach Kelling
  • 52,505
  • 13
  • 109
  • 108
  • This isn't the answer to the OPs problem. – Omnifarious May 15 '11 at 22:54
  • Using `sys.stdin.read` seems preferable to using `select.select` to me. Sure you can work around `select.select`, but it seems like a pretty ugly approach to me. – Zach Kelling May 15 '11 at 23:12
  • sys.stdin.read blocks; the point of using select.select is to be able to wait for one of two different input sources. – tvynr May 17 '11 at 04:12
  • Ah, I see. From your question that wasn't obvious to me (although I guess it should have been since you were using `select.select` in the first place). – Zach Kelling May 17 '11 at 04:21