5

I want something have similar effect of cmd > >(tee -a {{ out.log }}) 2> >(tee -a {{ err.log }} >&2) in python subporcess without calling tee. Basically write stdout to both stdout and out.log files and write stderr to both stderr and err.log. I knew I could use a loop to handle it. But since I have lots of Popen, subprocess.run calls in my code already and I do not want to rewrite the entire thing I wonder is there any easier interface provided by some package could just allow me to do something like:

subprocess.run(["ls", "-l"], stdout=some_magic_file_object(sys.stdout, 'out.log'), stderr=some_magic_file_object(sys.stderr, 'out.log') )
Wang
  • 7,250
  • 4
  • 35
  • 66
  • Possible duplicate: https://stackoverflow.com/questions/19425736/how-to-redirect-stdout-and-stderr-to-logger-in-python – Tzane Sep 20 '21 at 14:08
  • 1
    @Tzane that is completely different question. We are talking about subprocess's stdout/stderr. That question about python logger – Wang Sep 20 '21 at 17:25
  • Sure, but the point was to pipe the subprocess output to stdout/stderr and use the logger to log that. I see the solution here was more elegant than that though. – Tzane Sep 21 '21 at 11:44

1 Answers1

2

No simple way as far as I can tell, but here is a way:

import os


class Tee:
    def __init__(self, *files, bufsize=1):
        files = [x.fileno() if hasattr(x, 'fileno') else x for x in files]
        read_fd, write_fd = os.pipe()
        pid = os.fork()
        if pid:
            os.close(read_fd)
            self._fileno = write_fd
            self.child_pid = pid
            return
        os.close(write_fd)
        while buf := os.read(read_fd, bufsize):
            for f in files:
                os.write(f, buf)
        os._exit(0)

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.close()

    def fileno(self):
        return self._fileno

    def close(self):
        os.close(self._fileno)
        os.waitpid(self.child_pid, 0)

This Tee object takes a list of file objects (i.e. objects that either are integer file descriptors, or have a fileno method). It creates a child process that reads from its own fileno (which is what subprocess.run will write to) and writes that content to all of the files it was provided.

There's some lifecycle management needed, as its file descriptor must be closed, and the child process must be waited on afterwards. For that you either have to manage it manually by calling the Tee object's close method, or by using it as a context manager as shown below.

Usage:

import subprocess
import sys


logfile = open('out.log', 'w')
stdout_magic_file_object = Tee(sys.stdout, logfile)
stderr_magic_file_object = Tee(sys.stderr, logfile)

# Use the file objects with as many subprocess calls as you'd like here
subprocess.run(["ls", "-l"], stdout=stdout_magic_file_object, stderr=stderr_magic_file_object)

# Close the files after you're done with them.
stdout_magic_file_object.close()
stderr_magic_file_object.close()
logfile.close()

A cleaner way would be to use context managers, shown below. It would require more refactoring though, so you may prefer manually closing the files instead.

import subprocess
import sys


with open('out.log', 'w') as logfile:
    with Tee(sys.stdout, logfile) as stdout, Tee(sys.stderr, logfile) as stderr:
        subprocess.run(["ls", "-l"], stdout=stdout, stderr=stderr)

One issue with this approach is that the child process writes to stdout immediately, and so Python's own output will often get mixed up in it. You can work around this by using Tee on a temp file and the log file, and then printing the content of the temp file (and deleting it) once the Tee context block is exited. Making a subclass of Tee that does this automatically would be straightforward, but using it would be a bit cumbersome since now you need to exit the context block (or otherwise have it run some code) to print out the output of the subprocess.

Will Da Silva
  • 6,386
  • 2
  • 27
  • 52
  • Thanks a lot! I actually would like to see the output ASAP so write out immediately won't really be a problem. However, this approach seems have performance problem when the process output lots of data, it can even lock up at some extreme situation. It use a python for loop to read 1 byte per loop. I guess we have to implement something like subprocess.PIPE with some configurable buffer size (maybe just use the io.open()) and readout all bytes from the buffer every time we have the chance. – Wang Sep 20 '21 at 18:03
  • @Wang I originally processed a single byte at the time to provide immediate feedback, rather than waiting until the stream ends to output data in the buffer. I added a `bufsize` keyword argument to `Tee` that can be set higher (e.g. `4096`). This will result in less immediate feedback, as the last 4095 bytes won't be output until the stream ends, but the throughput should be much higher. Not sure exactly what would cause it to lock up. If you have more details then maybe I could improve the solution. – Will Da Silva Sep 20 '21 at 21:11