2

I have set up a subprocess command within a context manager that pipes the stdout and stderr to separate files via my own logger. This is a variation on the answer given here: https://stackoverflow.com/a/4838875/4844311

My code is as follows:

import logging
import subprocess

with StreamLogger(logging.DEBUG, my_out_logger) as out:
    with StreamLogger(logging.ERROR, my_err_logger) as err:
        p = subprocess.Popen(cmd, shell=False, stdout=out, stderr=err)
        p.communicate()
        p.wait()

where my_out_logger and my_err_logger are logging objects with handles that log to files etc.

The StreamLogger code is similar to the code given at the link above:

import io
import os
import threading
import select
import time

class StreamLogger(io.IOBase):
    def __init__(self, level, logger):
        self.logger = logger
        self.level = level
        self.pipe = os.pipe()
        self.thread = threading.Thread(target=self._flusher)
        self.thread.start()

    def _flusher(self):
        self._run = True
        buf = b''
        while self._run:
            for fh in select.select([self.pipe[0]], [], [], 0)[0]:
                buf += os.read(fh, 1024)
                while b'\n' in buf:
                    data, buf = buf.split(b'\n', 1)
                    self.write(data.decode())
            time.sleep(0.01)
        self._run = None

    def write(self, data):
        return self.logger.log(self.level, data)

    def fileno(self):
        return self.pipe[1]

    def close(self):
        if self._run:
            self._run = False
            while self._run is not None:
                time.sleep(0.01)
            os.close(self.pipe[0])
            os.close(self.pipe[1])

The only significant difference between my code and the code provided by the answer at the link above is that my code sends the logging message to a logger that redirects according to its handles rather than logging directly, as in the code at the link.

This code all works fine most of the time.

But I have noticed that every so often there is a truncated output logging file. It looks like the output file being written by a FileHandler in my_out_logger is being closed before all the stdout content has been written.

I am not sure why this is happening or where to fix the code. For now I have just added a time.sleep(0.3) statement between p.communicate() and p.wait() and this reduced the frequency of truncated files, but it seems like an ugly solution.

I would rather understand what is going wrong and fix it properly. I welcome any suggestions or insight.

berkelem
  • 2,005
  • 3
  • 18
  • 36
  • Remove `p.communicate()`, it is not required there – Sajeer Noohukannu Sep 27 '19 at 17:42
  • Okay I think I understand why. I tried without that line and I got a whole load more truncated files, but I think it's because of the `communicate` is performing a delaying role like the `time.sleep()` hack I've been using. Removing the line doesn't solve the problem. – berkelem Sep 28 '19 at 11:10
  • No idea. Have you tried flushing before you close? – Lennart Regebro Sep 28 '19 at 12:30
  • I tried adding a `sys.stdout.flush()` but it didn't help. Could you explain a little about how the StreamLogger code works? Where is `close()` called? And why is the `while self._run:` loop not an infinite loop? – berkelem Sep 30 '19 at 13:04

1 Answers1

0

I think I've finally figured it out. The problem here is that the StreamLogger code can't explicitly check to make sure the stdout is fully written. Once the main thread running the subprocess receives a returncode, it exits the context manager, calling the __exit__() method of the StreamLogger, which is inherited from IOBase (source code here). And this calls close() which changes the self._run attribute to False. This causes the thread that is polling the pipe to stop looping, regardless of what is still coming through the pipe.

This works fine for most commands with small to moderate outputs to stdout, where there is no lag time between the returncode returning and the output being written. But in my case I'm running a program through subprocess with a substantial amount of text written to stdout. So there's a kind of race against time to try and get the pipe cleared before the main thread tells the child thread to stop polling the pipe.

The two critical variables here are the size of the buffer being read from the pipe and the frequency at which the pipe is polled. I fixed my problem by increasing the buffer size in os.read() to 4096, and also removing the time.sleep() in the while loop in the _flusher() method. This maximizes the amount of data that can be read from the pipe and in my case ensures the output is fully logged before the logging loop stops.

berkelem
  • 2,005
  • 3
  • 18
  • 36