3

I'm making a script which takes has some others script output piped into it. The other script takes a while to complete, and prints the progress onto the console along with the data I want to parse.

Since I'm piping the result to my script, I want to be able to do 2 things. As my input comes, I would like to echo it out onto the screen. After the command completes, I would like to have a list of lines that were passed via stdin.

My first though was to use a simple

for line in sys.stdin:
     sys.stdout.write(line + '\n')
     lines.append(line)
     sys.stdout.flush()

but to my surprise, the command waits until stdin hits EOF, until it starts yielding lines.

My current workaround is this:

line = sys.stdin.readline()
lines = []
while line:
    sys.stdout.write(line.strip() + '\n')
    lines.append(line.strip())
    sys.stdout.flush()
    line = sys.stdin.readline()

But this does not always wait until the whole input is used.

Is there any other way to do this? It seems strange that the for solution behaves the way it does.

Bartlomiej Lewandowski
  • 10,771
  • 14
  • 44
  • 75

4 Answers4

3

Python uses buffered input. If you check with python --help you see:

-u     : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x

So try the unbuffered option with:

command | python -u your_script.py
enrico.bacis
  • 30,497
  • 10
  • 86
  • 115
  • I would use this, but is there a way to pass this parameter when my command is called via calling the shebang interpreter? – Bartlomiej Lewandowski Dec 21 '15 at 13:43
  • 1
    @BartlomiejLewandowski: Sure, I do this all the time. [Here](http://stackoverflow.com/q/3306518/1003123) you can find three different ways to accomplish that. I normally use the `#!/usr/bin/python -u` way, but you can choose the one that fits you. – enrico.bacis Dec 21 '15 at 14:04
  • just like this `#!/usr/bin/python -u` , just note, that only 1.st param is recognized this way – gilhad Dec 21 '15 at 14:05
3

edited to answer your question regarding exiting on end of input

The workaround you describe, or something similar like this below appears to be necessary:

#!/usr/bin/env python

import sys

lines = []

while True:
    line = sys.stdin.readline()
    if not line:
        break
    line = line.rstrip()
    sys.stdout.write(line + '\n')
    lines.append(line)
    sys.stdout.flush()

This is explained in the python man page, under the -u option:

   -u     Force stdin, stdout and stderr to  be  totally  unbuffered.   On
          systems  where  it matters, also put stdin, stdout and stderr in
          binary mode.  Note that there is internal  buffering  in  xread-
          lines(),  readlines()  and  file-object  iterators ("for line in
          sys.stdin") which is not influenced by  this  option.   To  work
          around  this, you will want to use "sys.stdin.readline()" inside
          a "while 1:" loop.

I created a file dummy.py containing the code above, then ran this:

for i in 1 2 3 4 5; do sleep 5; echo $i; echo; done | ./dummy.py

This is the output:

harold_mac:~ harold$ for i in 1 2 3 4 5; do sleep 5; echo $i; done | ./dummy.py
1

2

3

4

5

harold_mac:~ harold$
Harold Ship
  • 989
  • 1
  • 8
  • 14
1

Other people have already told you about the unbuffered output. I will just add a couple of thoughts:

  1. often it is better to print debug info to stderr, and stderr output is usually unbuffered
  2. it is simplier to delegate intermediate output to special tools. For example, there is a tee utility, that allows to split stdout of a previous command. Assuming you are in bash, you can print the intermediate output to stdout right away, and use process substitution instead of printing to a file (instead of awk you will call your python script):

    $ python -c 'for i in range(5): print i+1' | tee >( awk '{print "from awk", $0**2 }')
    1
    2
    3
    4
    5
    from awk 1
    from awk 4
    from awk 9
    from awk 16
    from awk 25
    
newtover
  • 31,286
  • 11
  • 84
  • 89
0

You need to make 1) stdin in your python program and 2) stdout on the contrary side of the pipe both to be line buffered. To get this 1) use stdin = os.fdopen(sys.stdin.fileno(), 'r', 1) in your program; 2) use stdbuf -oL to change buffering mode of the output of the other program:

stdbuf -oL otherprogram | python yourscript.py
user2683246
  • 3,399
  • 29
  • 31