7

I got stuck in piping the output of one script into another script (both are python).

This question is very similar but (1) it does not provide an answer (2) there is a slight difference in mine. So, I thought opening a new question would be better.

Here is the problem.
Both scripts are almost identical:

receiver.py

import sys
import time

for line in sys.stdin:
    sys.stdout.write(line)
    sys.stdout.flush()
    time.sleep(3)

replicator.py

import sys
import time

for line in sys.stdin:
    sys.stderr.write(line)
    sys.stderr.flush()
    time.sleep(3)

When I am executing these scripts in bash or cmd one by one, everything is fine. Both examples below are working and I see the input text in the output:

Works: (One line of output appears each 3 seconds)

cat data.txt | python receiver.py
cat data.txt | python replicator.py

But once I pipe from one script to another script they stop working:

Doesn't work: (Nothing appears until the end of file is being reached)

cat data.txt | python receiver.py | python replicator.py

Then when I pipe the first script to another tool it works again!

Works:

cat data.txt | python receiver.py | cat -n
cat data.txt | python replicator.py | cat -n

And finally when I remove the blocking sleep() function it starts to work again:

Removing the timer:

time.sleep(0)

Now it works:

cat data.txt | python receiver.py | python replicator.py

Does anybody know what is wrong with my piping? I am not looking for alternative ways to do it. I just want to learn what is happening here.

UPDATE

Based on the comments, I refined the examples.
Now both scripts not only print out the content of data.txt, but also add a time-stamp to each line.

receiver.py

import sys
import time
import datetime

for line in sys.stdin:
    sys.stdout.write(str(datetime.datetime.now().strftime("%H:%M:%S"))+'\t')
    sys.stdout.write(line)
    sys.stdout.flush()
    time.sleep(1)

data.txt

Line-A
Line-B
Line-C
Line-D

The result

$> cat data.txt
Line-A
Line-B
Line-C
Line-D

$> cat data.txt | python receiver.py
09:05:44        Line-A
09:05:45        Line-B
09:05:46        Line-C
09:05:47        Line-D

$> cat data.txt | python receiver.py | python receiver.py
09:05:54        09:05:50        Line-A
09:05:55        09:05:51        Line-B
09:05:56        09:05:52        Line-C
09:05:57        09:05:53        Line-D

$> cat test.log | python receiver.py | sed -e "s/^/$(date +"%H:%M:%S") /"
09:17:55        09:17:55        Line-A
09:17:55        09:17:56        Line-B
09:17:55        09:17:57        Line-C
09:17:55        09:17:58        Line-D

$> cat test.log | python receiver.py | cat | python receiver.py
09:36:21        09:36:17        Line-A
09:36:22        09:36:18        Line-B
09:36:23        09:36:19        Line-C
09:36:24        09:36:20        Line-D

As you see when I am piping the output of python script to itself, the second script waits until the first one is finished. Then it starts to digest the data.

However, when I am using another tool (sed in this example), the tool receives the data immediately. Why it is happening?

Community
  • 1
  • 1
Dark
  • 413
  • 5
  • 8
  • What do you mean by "Doesn't work"? What exactly happens in this case? – Chris_Rands Jul 10 '17 at 15:03
  • @Chris_Rands, it does not show any output until the first script finishes its job and then the output appears on the screen all at once. The expected behavior is to print 1 line of text each 3 seconds. -Question updated. – Dark Jul 10 '17 at 15:05
  • When you pipe the output the next script waits for it. It has to do with linux not with python. The quedtion is, why do you have to pipe anyway? Why not create a simple python functions and call them? – Imanol Luengo Jul 10 '17 at 15:18
  • @ImanolLuengo, Exactly my question is here. What is happening behind the scene that helps a tool like `cat` to work but prevents my script from functioning. As I mentioned in the question I just want to learn how it works. And another point is that this is not only about Linux, I also tested it on **Windows**. It shows the same behavior. – Dark Jul 10 '17 at 15:26

1 Answers1

1

This is due to the internal buffering in File Objects (for line in sys.stdin).

So, if we fetch line by line:

import sys
import time
import datetime

while True:
    line = sys.stdin.readline()
    if not line:
       break
    sys.stdout.write(str(datetime.datetime.now().strftime("%H:%M:%S"))+'\t')
    sys.stdout.write(line)
    sys.stdout.flush()
    time.sleep(1)

The code will work as expected:

$ cat data.txt | python receiver.py |  python receiver.py
09:43:46        09:43:46        Line-A
09:43:47        09:43:47        Line-B
09:43:48        09:43:48        Line-C
09:43:49        09:43:49        Line-D

Documentation

... Note that there is internal buffering in file.readlines() and File Objects (for line in sys.stdin) which is not influenced by this option. To work around this, you will want to use file.readline() inside a while 1: loop.

NOTE: The File Object thing was fixed in Python 3

Juan Diego Godoy Robles
  • 14,447
  • 2
  • 38
  • 52
  • But this won't terminate the script when EOF is reached. – Turn Jul 11 '17 at 07:54
  • Thank you @klashxx, it actually solved the problem. Just for my understanding: How does using a `while 1:` loop, flush the `file.readlines()` internal buffer? – Dark Jul 11 '17 at 07:56
  • 2
    @Dark - it doesn't, it's just in the way `readline()` works - it reads the stream until it encounters a `\n` at which point the buffer is returned. On the other hand, `for line in handle` expects the handle to be an iterable and Pyhon waits a long while before before it start sending individual elements (i.e. waits for the buffer to be filled). This behavior has been fixed but it still is a safer option to just use `readlines()` when reading from STDIN. (btw. you don't need `-u` at all, nor would it help you here as you're flushing your STDOUT/STDERR immediately anyway) – zwer Jul 11 '17 at 08:01
  • 1
    @zwer, Now I got it. Thank you. The point which I didn't understand was the `iterable` assumption of `for line in handle`. In my idea, `for line in handle` was equal to `file.readlines()`. I will also avoid using -u in this particular case. – Dark Jul 11 '17 at 08:04