2

I would like to run a code on n processes, and have the logs from each process in a separate file.

I tried, naively, sthing like this

from multiprocessing import Process
import logging


class Worker(Process):
    def __init__(self, logger_name, log_file):
        super().__init__()
        self.logger = logging.getLogger(logger_name)
        self.log_file = log_file
        self.logger.addHandler(logging.FileHandler(log_file))
        print("from init", self.logger, self.logger.handlers)

    def run(self) -> None:
        print("from run", self.logger, self.logger.handlers)


if __name__ == '__main__':
    p1 = Worker("l1", "log1")
    p1.start()

(tried in python 3.9 and 3.11) but from some reason, the handler is gone. This is the output:

from init <Logger l1 (WARNING)> [<FileHandler log1 (NOTSET)>]
from run <Logger l1 (WARNING)> []

Why is the FileHandler gone? Should I use the AddHandler within the run method -- is it a correct way?

I was trying to use this answer but couldn't make it really work.

For the moment, I solved it via defining the handlers in run but it seems like a dirty hack to me...

UPDATE: This happens on my MacBook python installations. On a linux server, I couldn't reproduce this. Very confusing.

In either case, the question is probably:

"Is this the correct way to log to files, with several copies of one process?"

Peter Franek
  • 577
  • 3
  • 8
  • 25
  • 1
    I can't reproduce, nor can I see why this would happen. Can you post the complete code with imports, and exactly how you run it? – 2e0byo Feb 18 '23 at 21:37
  • 1
    Good question. I reproduced your results on Win10 with python3.11. It seems to be related to the logging module's internals. Perhaps this will help, but perhaps not (he doesn't explain why his approach is necessary, IMO). https://superfastpython.com/multiprocessing-logging-in-python/ – Paul Cornelius Feb 18 '23 at 22:07
  • 1
    What happens if you get your logger from [`multiprocessing.getLogger()`](https://docs.python.org/3/library/multiprocessing.html#logging) ? Anyhow, some state is changing between instantising the Process() (in the main process), and forking/boostrapping (in the child process). I couldn't see why with a quick glance. If you really want to know, go through the forking process carefully and see what happens to the logger. Weird, though. – 2e0byo Feb 18 '23 at 22:58
  • @2e0byo Using multiprocessing.get_logger has the same behaviour. Interestingly, when I tried to do self.handler = ... in `__init__` I got some error ` cannot pickle '_thread.RLock' object` Could it be the clue? None of this happens on linux, apparently. – Peter Franek Feb 18 '23 at 23:10
  • 1
    presumably it's unpicklable on all platforms, but *something* is changing. It looks like you should add the handler in `run()`, since that's *known* to be inside the child process (and thus isolated from the main process). But I'm surprised that state set up in `__init__` isn't surviving. All kinds of weird things happen, though, when you try to pass state to processes, and to my mind there's nothing hacky about doing it explicitly inside the new context inside `run`. Maybe someone who doesn't need to deep-dive into mulitprocessing to do so will come along and tell us what's happening. – 2e0byo Feb 18 '23 at 23:14
  • Ok, I will do it in run.. It's first time I see a difference in basic python (only stdlib) between platforms. Thanks for your comments. – Peter Franek Feb 18 '23 at 23:15
  • @PeterFranek you may be seeing a difference between implementations of `fork()` (which doesn't exist at all on windows IIRC). But I don't actually know how multiprocessing sets up new processes, so that's only speculation... In any case, it's the [law of leaky abstractions](https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/) – 2e0byo Feb 19 '23 at 00:26

1 Answers1

2

I found the reason for the observed behavior. It has to do with pickling of objects when they are transferred between Processes.

In the standard library's implementation of Logger, a __reduce__ method is defined. This method is used in cases where an object cannot be reliably pickled. Instead of trying to pickle the object itself, the pickle protocol instead uses the returned value from __reduce__. In the case of Logger, __reduce__ returns a function name (getLogger) and a string (the name of the Logger being pickled) to be used as an argument. In the unpicking procedure, the unpickling protocol makes a function call (logging.getLogger(name)); the result of that function call becomes the unpickled Logger instance.

The original Logger and the unpickled Logger will have the same name, but perhaps not much else in common. The unpickled Logger will have the default configuration, whereas the original Logger will have any customization you may have performed.

In Python, Process objects do not share an address space (at least, not on Windows). When a new Process is launched, its instance variables must somehow be "transferred" from one Process to another. This is done by pickling/unpickling. In the example code, the instance variables declared in the Worker.__init__ function do indeed appear in the new Process, as you can verify by printing them in Worker.run. But under the hood Python has actually pickled and unpickled all of the instance variables, to make it look like they magically have migrated to the new Process. In the vast majority of cases this works just fine. But not necessarily if one of those instance variables defines a __reduce__ method.

A logging.FileHandler cannot, I suspect, be pickled since it uses operating system resources (a file). This is probably the reason (or at least one of the reasons) why Logger objects can't be pickled.

Paul Cornelius
  • 9,245
  • 1
  • 15
  • 24
  • I would personally prefer to rather forbid defining such objects in Proces init completely (raise), rather than facing unexpected behaviour. But I kind of understand now. Thank you – Peter Franek Feb 19 '23 at 08:00