2

I got a simple python script which should read from stdin. So if I redirect a stdout of a program to the stdin to my python script.

But the stuff that's logged by my program to the python script will only "reach" the python script when the program which is logging the stuff gets killed.

But actually I want to handle each line which is logged by my program as soon as it is available and not when my program which should actually run 24/7 quits.

So how can I make this happen? How can I make the stdin not wait for CTRL+D or EOF until they handle data?

Example

# accept_stdin.py
import sys
import datetime

for line in sys.stdin:
    print datetime.datetime.now().second, line

# print_data.py
import time

print "1 foo"
time.sleep(3)
print "2 bar"

# bash
python print_data.py | python accept_stdin.py
noob
  • 8,982
  • 4
  • 37
  • 65

3 Answers3

4

Like all file objects, the sys.stdin iterator reads input in chunks; even if a line of input is ready, the iterator will try to read up to the chunk size or EOF before outputting anything. You can work around this by using the readline method, which doesn't have this behavior:

while True:
    line = sys.stdin.readline()
    if not line:
        # End of input
        break
    do_whatever_with(line)

You can combine this with the 2-argument form of iter to use a for loop:

for line in iter(sys.stdin.readline, ''):
    do_whatever_with(line)

I recommend leaving a comment in your code explaining why you're not using the regular iterator.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • This does not seem to be supported by the [documentation for `file.next()`](https://docs.python.org/2/library/stdtypes.html#file.next). – chepner Aug 08 '14 at 12:15
  • @chepner: "In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer." – user2357112 Aug 08 '14 at 12:18
  • "Hidden read-ahead buffer"; that means that a full chunk is read immediately, even if that is more than necessary to return a full line. Later calls to `next` will read from the buffer if possible before reading from disk again. – chepner Aug 08 '14 at 12:21
  • @chepner: Is there any part of the documentation you think explicitly contradicts my answer? My tests seem to support this interpretation, although I'm afraid I can only test on Windows. If you read the [Python file object source code](http://hg.python.org/cpython/file/cbcb10123451/Objects/fileobject.c#l2324), you'll find that the C-level call made to read from the file is an `fread` sized to the file object's buffer inside [`Py_UniversalNewlineFread`](http://hg.python.org/cpython/file/cbcb10123451/Objects/fileobject.c#l2806); I don't think that'll stop early and just return what's available. – user2357112 Aug 08 '14 at 12:48
  • I agree that this seems to work; the documentation just doesn't mention any substantial difference between using the `readline` method and calling `next` on the iterator. (The documentation is vague on *how* `next` returns the next line.) – chepner Aug 08 '14 at 12:58
  • 1
    @chepner: There's some more information in other sections, though perhaps not ones someone looking for info on file objects would be likely to check. For example, the documentation for the [`-u` flag](https://docs.python.org/2/using/cmdline.html#cmdoption-u) says to use `readline` to work around `next`'s buffering. – user2357112 Aug 08 '14 at 13:01
  • @user2357112 I'm sorry but this is somehow not working on my machine. – noob Aug 08 '14 at 14:01
3

It is also an issue with your producer program, i.e. the one you pipe stdout to your python script.

Indeed, as this program only prints and never flushes, the data it prints is kept in the internal program buffers for stdout and not flushed to the system.

Add sys.stdout.flush() call right after you print statement in print_data.py.

You see the data when you quit the program as it automatically flushes on exit.

See this question for explanation,

Community
  • 1
  • 1
Didier Trosset
  • 36,376
  • 13
  • 83
  • 122
  • Note that the python script which "provides" the data was just an example. I actually got no control over the actual program which "provides" the data. – noob Aug 08 '14 at 13:52
2

As said by @user2357112 you need to use:

for line in iter(sys.stdin.readline, ''):

After that you need to start python with the -u flag to flush stdin and stdout immediately.

python -u print_data.py | python -u accept_stdin.py

You can also specify the flag in the shebang.

enrico.bacis
  • 30,497
  • 10
  • 86
  • 115
  • 1
    Wait, I'm trying it but it doesn't seem to work, but.. I'm sure :) I'll catch the bug, sorry – enrico.bacis Aug 08 '14 at 10:42
  • Yes, @user2357112 is right, but you also need the `-u` flag – enrico.bacis Aug 08 '14 at 10:57
  • well the approach seems kind of nice and actually works but because I got no control over the actual program so I can't make the stdout of the program which would be print_data.py in this example unbuffered which leads to failure. – noob Aug 08 '14 at 14:00
  • @mic: If that means the provider program isn't flushing its output, you're out of luck. With the restrictions you've stated, I don't think there's anything you can do to make it flush. – user2357112 Aug 08 '14 at 21:22