0

I have a CLI tool written in Python that operates on a bunch of files and makes use of processes for small file operations. While the parts executed as threads are only subsections in a longer file flow, the logging output of the threaded actions is (obviously) asynchronous.

My idea would be to buffer all logging output in memory (it is very limited output) until the processes have finished, and then sort all logging by process ID before actually outputting it to STDERR. I understand most parts of the logging tutorial, but this is a bit over my head. I attempted using the MemoryHandler, but couldn't make much sense of how to implement custom flush conditions or why it needs a target Handler itself.

This code illustrates my basic problem:

logger = logging.getLogger()
logger.handlers = []
logging.basicConfig(format='%(levelname)s %(process)s: %(message)s',
                    level=logging.DEBUG)

def work():
    for i in range(5):
        logging.debug(str(i))
        time.sleep(random())

if __name__ == "__main__":
    processes = []
    for i in range(3):
        p = Process(target=work)
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

The synchronous output is something like:

DEBUG 64571: 0
DEBUG 64572: 0
DEBUG 64573: 0
DEBUG 64572: 1
DEBUG 64572: 2
DEBUG 64572: 3
DEBUG 64572: 4
DEBUG 64573: 1
DEBUG 64571: 1
DEBUG 64573: 2
DEBUG 64573: 3
DEBUG 64573: 4
DEBUG 64571: 2
DEBUG 64571: 3
DEBUG 64571: 4

Instead of this output, I would like the console output to be halted while the process runs, and then output the logs sorted by process ID, ideally outputting as soon as the next-first process has finished.

I'm mostly looking for what to conceptually do and what modules and methods could be useful for this. With my code my main questions are:

a) How do I us the MemoryHandler to store the output and how do I format it once a process is finished? Or is that not a viable idea altogether? b) How can I monitor processes finishing, either one by one, or all (I guess the later is just code after the p.join() loop)? c) Outside my isolated example, how to turn this "process-sorted" logging on before, and off after, the section in my code that uses multithreading.

Thanks for any pointers ;)

kontur
  • 4,934
  • 2
  • 36
  • 62
  • 1
    Unrelated note: you're using _multi-processing_, not _multi-theading_ :) – rdas Jun 17 '19 at 14:46
  • @rdas Thanks, I was confusing the terminology, had a look at [this](https://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python) — updated the question to better express this ;) – kontur Jun 19 '19 at 07:04

0 Answers0