read subprocess stdout line by line

Question

My python script uses subprocess to call a linux utility that is very noisy. I want to store all of the output to a log file and show some of it to the user. I thought the following would work, but the output doesn't show up in my application until the utility has produced a significant amount of output.

#fake_utility.py, just generates lots of output over time
import time
i = 0
while True:
   print hex(i)*512
   i += 1
   time.sleep(0.5)

#filters output
import subprocess
proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)
for line in proc.stdout:
   #the real code does filtering here
   print "test:", line.rstrip()

The behavior I really want is for the filter script to print each line as it is received from the subprocess. Sorta like what tee does but with python code.

What am I missing? Is this even possible?

Update:

If a sys.stdout.flush() is added to fake_utility.py, the code has the desired behavior in python 3.1. I'm using python 2.6. You would think that using proc.stdout.xreadlines() would work the same as py3k, but it doesn't.

Update 2:

Here is the minimal working code.

#fake_utility.py, just generates lots of output over time
import sys, time
for i in range(10):
   print i
   sys.stdout.flush()
   time.sleep(0.5)

#display out put line by line
import subprocess
proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)
#works in python 3.0+
#for line in proc.stdout:
for line in iter(proc.stdout.readline,''):
   print line.rstrip()

you could use `print line,` instead of `print line.rstrip()` (note: comma at the end). — jfs, Jan 23 '12 at 11:14
related: [Python: read streaming input from `subprocess.communicate()`](http://stackoverflow.com/q/2715847/4279) — jfs, Sep 09 '14 at 23:03
Update 2 states that it works with python 3.0+ but uses the old print statement, so it does not work with python 3.0+. — Rooky, Dec 19 '16 at 21:02
None of the answers listed here worked for me, but https://stackoverflow.com/questions/5411780/python-run-a-daemon-sub-process-read-stdout/5413588#5413588 did! — boxed, Nov 11 '18 at 08:48
interesting the code that only works in python3.0+ uses 2.7 syntax for print. — thang, Sep 16 '20 at 22:31
the update does not work. you're only printing line by line, not receiving them one by one. — Vaidøtas I., Feb 27 '21 at 22:02

score 233 · Accepted Answer · edited Feb 22 '22 at 11:52

233

I think the problem is with the statement for line in proc.stdout, which reads the entire input before iterating over it. The solution is to use readline() instead:

#filters output
import subprocess
proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)
while True:
  line = proc.stdout.readline()
  if not line:
    break
  #the real code does filtering here
  print "test:", line.rstrip()

Of course you still have to deal with the subprocess' buffering.

Note: according to the documentation the solution with an iterator should be equivalent to using readline(), except for the read-ahead buffer, but (or exactly because of this) the proposed change did produce different results for me (Python 2.5 on Windows XP).

edited Feb 22 '22 at 11:52

Neuron

5,141
5
38
59

answered May 11 '10 at 18:48

Rômulo Ceccon

10,081
5
39
47

12

for `file.readline()` vs. `for line in file` see http://bugs.python.org/issue3907 (in short: it works on Python3; use `io.open()` on Python 2.6+) – jfs Jan 23 '12 at 11:16
5

The more pythonic test for an EOF, per the "Programming Recommendations" in PEP 8 (http://www.python.org/dev/peps/pep-0008/), would be 'if not line:'. – Jason Mock Nov 13 '12 at 15:20
there is no `open()` used in this script; where do you put `io.open()`? is there a workaround for 2.5? – n611x007 Nov 14 '12 at 14:06
18

@naxa: for pipes: `for line in iter(proc.stdout.readline, ''):`. – jfs Nov 14 '12 at 18:22
2

@J.F.Sebastian: did you try this solution on Python3? I have code that previously ran on Python 2(.7) using the `iter(proc.stdout.readline, '')` approach, and now that I switched to Python 3.4 that code went pear-shaped, the loop does not return and RAM usage oscillates between ~0 and 3 GB. – Dr. Jan-Philip Gehrcke Feb 22 '15 at 19:37
5

@Jan-PhilipGehrcke: yes. 1. you could use `for line in proc.stdout` on Python 3 (there is no the read-ahead bug) 2. `'' != b''` on Python 3 -- don't copy-paste the code blindly -- think what it does and how it works. – jfs Feb 23 '15 at 02:25
3

@J.F.Sebastian: sure, the `iter(f.readline, b'')` solution is rather obvious (and also works on Python 2, if anyone is interested). The point of my comment was not to blame your solution (sorry if it appeared like that, I read that now, too!), but to describe the extent of the symptoms, which are quite severe in this case (most of the Py2/3 issues result in exceptions, whereas here a well-behaved loop changed to be endless, and garbage collection struggles fighting the flood of newly created objects, yielding memory usage oscillations with long period and large amplitude). – Dr. Jan-Philip Gehrcke Feb 23 '15 at 13:04
1

@Jan-PhilipGehrcke: whether to use `''` or `b''` depends on `universal_newlines` parameter that enables text mode. It is not obvious. There are parameters that are different on Python 2 and 3. You should be careful if you write single source Python 2/3 compatible code that uses `subprocess` module. – jfs Feb 23 '15 at 13:21
1

@J.F.Sebastian: I agree that there is a lot to consider when using `subprocess`, but usage of `b''` fits *most* application scenarios, because the well-chosen default in both, Python 2 and 3 is to treat `subprocess.PIPE` as a byte stream, and to not implicitly perform de/encoding operations. I'd say `b''` is recommendable even on Python 2, because it is semantically better (explicit). Indeed, `b''` would be wrong with `universal_newlines=True` on Python 3 (which renders `stdout/err` attributes to be `TextIOWrapper` objects). On Python 2, `b''` works independent of `universal_newlines`. – Dr. Jan-Philip Gehrcke Feb 23 '15 at 14:26
1

How can you see if `proc` has terminated before trying to read another line from its stdout? – HelloGoodbye Jul 07 '16 at 11:16
1

Does this care how frequently or infrequently the called process sends output? Could it run indefinitely for months only printing a line every 30 seconds? I don't understand how `readline()` can determine when the program output is actually finished... – Will Jul 09 '16 at 01:07
3

I recommmend to add `sys.stdout.flush()` before breaking, otherwise things mix up. – Dawid Gosławski Mar 15 '18 at 09:52
1

@JasonMock `if not line:` will also break on the first empty line (which is not necessarily at the end of the stream). `if line is not None:` should work properly. – Andre Holzner Dec 05 '21 at 16:03

jbg · Answer 2 · 2019-12-26T17:07:37.397

96

Bit late to the party, but was surprised not to see what I think is the simplest solution here:

import io
import subprocess

proc = subprocess.Popen(["prog", "arg"], stdout=subprocess.PIPE)
for line in io.TextIOWrapper(proc.stdout, encoding="utf-8"):  # or another encoding
    # do something with line

(This requires Python 3.)

edited Dec 26 '19 at 17:07

answered Jan 22 '16 at 03:56

jbg

4,903
1
27
30

29

I'd like to use this answer but I am getting: `AttributeError: 'file' object has no attribute 'readable'` py2.7 – Dan Garthwaite Feb 26 '16 at 14:55
7

Works with python 3 – matanster Jan 10 '18 at 21:41
Clearly this code is not valid for multiple reasons py3/py3 compatibility and real risk of getting ValueError: I/O operation on closed file – sorin Nov 13 '18 at 15:01
11

@sorin neither of those things make it "not valid". If you're writing a library that still needs to support Python 2, then don't use this code. But many people have the luxury of being able to use software released more recently than a decade ago. If you try to read on a closed file you'll get that exception regardless of whether you use `TextIOWrapper` or not. You can simply handle the exception. – jbg Dec 26 '19 at 17:10
2

you are maybe late to the party but you answer is up to date with current version of Python, ty – Dusan Gligoric Jan 16 '20 at 12:59
This logic works fine but i am getting extra '\n' at every line. Is there a way to suppress that? – Ammad Aug 11 '20 at 23:02
3

@Ammad `\n` is the newline character. it's conventional in Python for the newline to not be removed when splitting by lines - you'll see the same behaviour if you iterate over a file's lines or use a `readlines()` method. You can get the line without it with just `line[:-1]` (TextIOWrapper operates in "universal newlines" mode by default, so even if you're on Windows and the line ends with `\r\n`, you'll only have `\n` at the end, so `-1` works). You can also use `line.rstrip()` if you don't mind any other whitespace-like characters at the end of the line also being removed. – jbg Aug 13 '20 at 03:43
2

I got `AttributeError: 'file' object has no attribute 'readable'` on python 3.7, but it was because I was using `subprocess.run` instead of `subprocess.Popen`. – cowlinator Mar 17 '21 at 07:30

score 27 · Answer 3 · answered Aug 29 '14 at 16:36

Indeed, if you sorted out the iterator then buffering could now be your problem. You could tell the python in the sub-process not to buffer its output.

proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE)

becomes

proc = subprocess.Popen(['python','-u', 'fake_utility.py'],stdout=subprocess.PIPE)

I have needed this when calling python from within python.

score 20 · Answer 4 · edited Oct 11 '19 at 15:23

A function that allows iterating over both stdout and stderr concurrently, in realtime, line by line

In case you need to get the output stream for both stdout and stderr at the same time, you can use the following function.

The function uses Queues to merge both Popen pipes into a single iterator.

Here we create the function read_popen_pipes():

from queue import Queue, Empty
from concurrent.futures import ThreadPoolExecutor


def enqueue_output(file, queue):
    for line in iter(file.readline, ''):
        queue.put(line)
    file.close()


def read_popen_pipes(p):

    with ThreadPoolExecutor(2) as pool:
        q_stdout, q_stderr = Queue(), Queue()

        pool.submit(enqueue_output, p.stdout, q_stdout)
        pool.submit(enqueue_output, p.stderr, q_stderr)

        while True:

            if p.poll() is not None and q_stdout.empty() and q_stderr.empty():
                break

            out_line = err_line = ''

            try:
                out_line = q_stdout.get_nowait()
            except Empty:
                pass
            try:
                err_line = q_stderr.get_nowait()
            except Empty:
                pass

            yield (out_line, err_line)

read_popen_pipes() in use:

import subprocess as sp


with sp.Popen(my_cmd, stdout=sp.PIPE, stderr=sp.PIPE, text=True) as p:

    for out_line, err_line in read_popen_pipes(p):

        # Do stuff with each line, e.g.:
        print(out_line, end='')
        print(err_line, end='')

    return p.poll() # return status-code

score 19 · Answer 5 · edited Dec 11 '16 at 18:11

19

You want to pass these extra parameters to subprocess.Popen:

bufsize=1, universal_newlines=True

Then you can iterate as in your example. (Tested with Python 3.5)

edited Dec 11 '16 at 18:11

nikoliazekter

757
2
6
23

answered Oct 16 '15 at 18:57

user1747134

2,374
1
19
26

2

@nicoulaj It should work if using the subprocess32 package. – Quantum7 Feb 14 '17 at 18:11

score 7 · Answer 6 · answered Dec 27 '18 at 23:20

7

You can also read lines w/o loop. Works in python3.6.

import os
import subprocess

process = subprocess.Popen(command, stdout=subprocess.PIPE)
list_of_byte_strings = process.stdout.readlines()

answered Dec 27 '18 at 23:20

aiven

3,775
3
27
52

1

Or to convert into strings: `list_of_strings = [x.decode('utf-8').rstrip('\n') for x in iter(process.stdout.readlines())]` – ndtreviv Nov 28 '19 at 11:10
3

@ndtreviv, you can pass text=True to Popen or use its "encoding" kwarg if you want the output as strings, no need to convert it yourself – Bobby Impollonia Jan 28 '21 at 17:51

StefanQ · Answer 7 · 2021-01-20T12:29:21.737

4

Pythont 3.5 added the methods run() and call() to the subprocess module, both returning a CompletedProcess object. With this you are fine using proc.stdout.splitlines():

proc = subprocess.run( comman, shell=True, capture_output=True, text=True, check=True )
for line in proc.stdout.splitlines():
   print "stdout:", line

See also How to Execute Shell Commands in Python Using the Subprocess Run Method

edited Jan 20 '21 at 12:29

answered Mar 22 '20 at 09:04

StefanQ

708
10
16

7

This solution is short and effective. One problem, compared to the original question: it does not print each line "as it is received," which I think means printing the messages in realtime just as if running the process directly in the command line. Instead it only prints the output _after_ the process finishes running. – sfuqua Jun 02 '21 at 16:22
2

Thanks @sfuqua for mentioning that. I use pipelines extensively and rely on streaming data and would have wrongly chosen this for its brevity. – Sridhar Sarnobat Mar 13 '22 at 19:49

shakram02 · Answer 8 · 2022-10-29T21:35:06.950

I tried this with python3 and it worked, source

When you use popen to spawn the new thread, you tell the operating system to PIPE the stdout of the child processes so the parent process can read it and here, stderr is copied to the stderr of the parent process.

in output_reader we read each line of stdout of the child process by wrapping it in an iterator that populates line by line output from the child process whenever a new line is ready.

def output_reader(proc):
    for line in iter(proc.stdout.readline, b''):
        print('got line: {0}'.format(line.decode('utf-8')), end='')


def main():
    proc = subprocess.Popen(['python', 'fake_utility.py'],
                            stdout=subprocess.PIPE,
                            stderr=subprocess.STDOUT)

    t = threading.Thread(target=output_reader, args=(proc,))
    t.start()

    try:
        time.sleep(0.2)
        import time
        i = 0
    
        while True:
        print (hex(i)*512)
        i += 1
        time.sleep(0.5)
    finally:
        proc.terminate()
        try:
            proc.wait(timeout=0.2)
            print('== subprocess exited with rc =', proc.returncode)
        except subprocess.TimeoutExpired:
            print('subprocess did not terminate in time')
    t.join()

That's great, but it seems to use the normal `Popen`. Instead of just showing a code snippit, you should really describe how it is set apart and what it does. There is a lot in there that surprises the heck out of the reader, and we're supposed to keep to the principle of least surprise. — Maarten Bodewes, Oct 19 '22 at 17:22
Thank you @MaartenBodewes, I added more details to the answer, please let me know if you have more comments — shakram02, Oct 29 '22 at 21:35
Much better, upvoted. I'll remove my comment, you can do the same :) — Maarten Bodewes, Oct 29 '22 at 21:44

score 1 · Answer 9 · edited Mar 01 '19 at 06:01

1

The following modification of Rômulo's answer works for me on Python 2 and 3 (2.7.12 and 3.6.1):

import os
import subprocess

process = subprocess.Popen(command, stdout=subprocess.PIPE)
while True:
  line = process.stdout.readline()
  if line != '':
    os.write(1, line)
  else:
    break

edited Mar 01 '19 at 06:01

binariedMe

4,309
1
18
34

answered Apr 02 '17 at 17:14

mdh

5,355
5
26
33

Stan S. · Answer 10 · 2022-04-21T11:56:44.780

I was having a problem with the arg list of Popen to update servers, the following code resolves this a bit.

import getpass
from subprocess import Popen, PIPE

username = 'user1'
ip = '127.0.0.1'

print ('What is the password?')
password = getpass.getpass()
cmd1 = f"""sshpass -p {password} ssh {username}@{ip}"""
cmd2 = f"""echo {password} | sudo -S apt update"""
cmd3 = " && "
cmd4 = f"""echo {password} | sudo -S apt upgrade -y"""
cmd5 = " && "
cmd6 = "exit"
commands = [cmd1, cmd2, cmd3, cmd4, cmd5, cmd6]

command = " ".join(commands)

cmd = command.split()

with Popen(cmd, stdout=PIPE, bufsize=1, universal_newlines=True) as p:
    for line in p.stdout:
        print(line, end='')

And to run the update on a local computer, the following code example does this.

import getpass
from subprocess import Popen, PIPE

print ('What is the password?')
password = getpass.getpass()

cmd1_local = f"""apt update"""
cmd2_local = f"""apt upgrade -y"""
commands = [cmd1_local, cmd2_local]

with Popen(['echo', password], stdout=PIPE) as auth:
    for cmd in commands:
        cmd = cmd.split()
        with Popen(['sudo','-S'] + cmd, stdin=auth.stdout, stdout=PIPE, bufsize=1, universal_newlines=True) as p:
            for line in p.stdout:
                print(line, end='')

score 0 · Answer 11 · answered Aug 17 '23 at 21:19

An improved version of https://stackoverflow.com/a/57093927/2580077 and suitable to python 3.10

A function to iterate over both stdout and stderr of the process in parallel.

Improvements:

Unified queue to maintain the order of entries in stdout and stderr.
Yield all available lines in stdout and stderr - this is useful when the calling process is slower.
Use blocking in the loop to prevent the process from utilizing 100% of the CPU.

import time
from queue import Queue, Empty
from concurrent.futures import ThreadPoolExecutor

def enqueue_output(file, queue, level):
    for line in file:
        queue.put((level, line))
    file.close()


def read_popen_pipes(p, blocking_delay=0.5):

    with ThreadPoolExecutor(2) as pool:
        q = Queue()

        pool.submit(enqueue_output, p.stdout, q, 'stdout')
        pool.submit(enqueue_output, p.stderr, q, 'stderr')

        while True:
            if p.poll() is not None and q.empty():
                break

            lines = []
            while not q.empty():
                lines.append(q.get_nowait())

            if lines:
                yield lines

            # otherwise, loop will run as fast as possible and utilizes 100% of the CPU
            time.sleep(blocking_delay)

Usage:

with subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=1, universal_newlines=True) as p:
    for lines in read_popen_pipes(p):
        # lines - all the log entries since the last loop run.
        print('ext cmd', lines)
        # process lines

read subprocess stdout line by line

11 Answers11

Linked

Related