If a series of commands are piped in linux, it handles it efficiently, ie. it terminates the previous subprocess if the last subprocess has terminated. For instance,
cat filename | head -n 1
zcat filename | head -n 1
hadoop fs -cat /some/path | head -n 1
In each of the above, the cat command would take considerable time, but the combined command performs fast. How is it done internally? Are the first commands (cat
commands) given SIGTERM, SIGKILL by the OS as soon as the head
terminates?
I wanted to do something similar in Python and was wondering what should be the best way to do it. I am trying to do the following:
p1 = Popen(['hadoop','fs','-cat',path], stdout=PIPE)
p2 = Popen(['head','-n',str(num_lines)], stdin=p1.stdout,stdout=PIPE)
p2.communicate()
p1.kill() or p1.terminate()
Is this efficient?