Low-overhead method of reading from a popen handle

Question

I have inherited code that enters a busy loop reading the output of a subprocess looking for a keyword, but I would like it to work with lower overhead. The code is as follows:

def stdout_search(self, file, keyword)
    s = ''
    while True:
        c = file.read(1)
        if not c:
            return None
        if c != '\r' and c != '\n':
            s += c
            continue
        s = s.strip()
        if keyword in s:
            break
        s = ''
    i = s.find(keyword) + len(keyword)
    return s[i:]

def scan_output(self, file, ev)
    while not ev.wait(0):
        s = self.stdout_search(file, 'Keyword:')
        if not s:
            break
        # Do something useful with s
        offset = #calculate offset
        wx.CallAfter(self.offset_label.SetLabel offset)
        #time.sleep(0.03)

The output from the Popened process is something like:

Keyword: 1 of 100
Keyword: 2 of 100
...etc...

Uncommenting the time.sleep(0.03) at the end of scan_output takes the load on a single core down from 100% to an acceptable 25% or so, but unfortunately the offset label redraw stutters, and although I am reading a frame count from a 30 fps playback, the label often updates less than once a second. How can I implement this code with a more correct wait for input?

BTW, the full code may be found here.

score 1 · Accepted Answer · edited May 23 '17 at 11:53

Reading one byte at a time is inefficient. See Reading binary file in Python and looping over each byte.

If you don't need an immediate feedback; use Popen.communicate() to get all output at once.

To avoid freezing your GUI, you could put IO into a background thread. It is a simple portable option for a blocking IO that supports incremental reading.

To handle the output as soon as it is flushed by the child process, you could use asynchronous I/O such as Tkinter's createfilehandler(), Gtk's io_add_watch(), etc -- you provide a callback and the GUI calls it when the next chunk of data is ready.

If the child flushes the data too often; the callback may just read the chunk and put it in a buffer then you could process the buffer every X seconds using Tkinter's widget.after(), Gtk's GObject.timeout_add() or whenever it reaches a certain size or certain number on lines, etc.

To read until 'Keyword:', you could use a code similar to asyncio's readuntil(). See also, How to read records terminated by custom separator from file in python?

Low-overhead method of reading from a popen handle

1 Answers1