10

Suppose I configured logging handlers in the main process. The main process spawns some children and due to os.fork() (in Linux) all loggers and handlers are inherited from the main process. In the example below 'Hello World' would be printed 100 times to the console:

import multiprocessing as mp
import logging


def do_log(no):
    # root logger logs Hello World to stderr (StreamHandler)
    # BUT I DON'T WANT THAT!
    logging.getLogger().info('Hello world {}'.format(no))


def main():
    format = '%(processName)-10s %(name)s %(levelname)-8s %(message)s'

    # This creates a StreamHandler
    logging.basicConfig(format=format, level=logging.INFO)

    n_cores = 4
    pool = mp.Pool(n_cores)
    # Log to stdout 100 times concurrently
    pool.map(do_log, range(100))
    pool.close()
    pool.join()


if __name__ == '__main__':
    main()

This will print something like:

ForkPoolWorker-1 root INFO     Hello world 0
ForkPoolWorker-3 root INFO     Hello world 14
ForkPoolWorker-3 root INFO     Hello world 15
ForkPoolWorker-3 root INFO     Hello world 16
...

However, I don't want the child process to inherit all the logging configuration from the parent. So in the example above do_log should not print anything to stderr because there should be no StreamHandler.

How do I prevent inheriting the loggers and handlers without removing or deleting them in the original parent process?


EDIT: Would it be a good idea to simply remove all handlers at the initialization of the pool?

def init_logging():
    for logger in logging.Logger.manager.loggerDict.values():
        if hasattr(logger, 'handlers'):
            logger.handlers = []

and

pool = mp.Pool(n_cores, initializer=init_logging, initargs=())

Moreover, can I also safely close() all (file) handlers during the initialization function?

SmCaterpillar
  • 6,683
  • 7
  • 42
  • 70
  • `multiprocessing.Pool` has an initializer function called in the child but I don't know how it could turn off all the logging handlers without a dirty hack. – tdelaney Mar 12 '15 at 17:15
  • Could I simply say `logging.shutdown(); logging.Logger.manager.loggerDict={}` in the initializer? Or would this interfere with new handlers I create afterwards? Or be to hacky? – SmCaterpillar Mar 12 '15 at 17:26
  • Its defined to be used at application exit, so I wouldn't expect it to be a good solution. It also flushes any buffers which would be a bad thing (imagine a file logger). In fact, I'm surprised that I haven't bumped into this problem myself... I'm looking forward to responses. It may be a good new feature to add to logging. – tdelaney Mar 12 '15 at 17:36

3 Answers3

6

You don't need to prevent it, you just need to reconfigure the logging hierarchy.

I think you're on the right track with the pool initializer. But instead of trying to hack things, let the logging package do what it's designed to do. Let the logging package do the reconfiguring of the logging hierarchy in the worker processes.

Here's an example:

def main():

    def configure_logging():
        logging_config = {
            'formatters': {
                'f': {
                    'format': '%(processName)-10s %(name)s'
                              ' %(levelname)-8s %(message)s',
                },
            },
            'handlers': {
                'h': {
                    'level':'INFO',
                    'class':'logging.StreamHandler',
                    'formatter':'f',
                },
            },
            'loggers': {
                '': {
                    'handlers': ['h'],
                    'level':'INFO',
                    'propagate': True,
                },
            },
            'version': 1,
        }

        pname = mp.current_process().name
        if pname != 'MainProcess':
            logging_config['handlers'] = {
                'h': {
                    'level':'INFO',
                    'formatter':'f',
                    'class':'logging.FileHandler',
                    'filename': pname + '.log',
                },
            }

        logging.config.dictConfig(logging_config)

    configure_logging() # MainProcess
    def pool_initializer():
        configure_logging()

    n_cores = 4
    pool = mp.Pool(n_cores, initializer=pool_initializer)
    pool.map(do_log, range(100))
    pool.close()
    pool.join()

Now, the worker processes will each log to their own individual log files, and will no longer use the main process's stderr StreamHandler.

snapshoe
  • 13,454
  • 1
  • 24
  • 28
2

The most straightforward answer is that you should probably avoid modifying globals with multiprocessing. Note that the root logger, which you get using logging.getLogger(), is global.

The easiest way around this is simply creating a new logging.Logger instance for each process. You can name them after the processes, or simply randomly:

log= logging.getLogger(str(uuid.uuid4()))

You may also want to check how should I log while using multiprocessing in python

loopbackbee
  • 21,962
  • 10
  • 62
  • 97
  • Well, I want every process to log to a new file. This is the reason why I don't want to inherit logger configurations. Also there's a bunch of user defined loggers that already exist. So the loggers are not only up to me ;-). Moreover, I want still to be able to have a logging hierarchy so something like `log= logging.getLogger(str(uuid.uuid4()))` definitely won't work. – SmCaterpillar Mar 12 '15 at 17:50
  • @SmCaterpillar you can have each logger output to a different file by simply setting the appropriate handlers. I advise you not to have a common logging hierarchy on different processes - you'd need to implement a inter-process locking mechanism yourself ( see the answer I linked to for details) – loopbackbee Mar 13 '15 at 15:15
  • Still not convinced :-). Why should I not keep a hierarchy? The point is, I have something like a simulator (more than that, basically a simulator framework) that runs different parameter settings, usually single core. However, since all the parameter runs are independent I may from time to time switch to multiprocessing. Thus, I don't want to rewrite my logging handling and logger hierarchy, but simply make it work with multiprocessing. I like the idea that each process logs to a different file. However, even if I go for that one, my old handlers still persist due to forking :-(. – SmCaterpillar Mar 13 '15 at 15:40
  • @SmCaterpillar simply because you will then have two different processes writing to the same file concurrently - which you probably don't want. Your old handlers persist (*all* global state persists, really - modules, global variables, etc), but that doesn't matter if you don't *use* them. – loopbackbee Mar 13 '15 at 18:52
  • Ok, so this should be fine: ``for logger in logging.Logger.manager.loggerDict.values(): logger.handlers = []`` at the initialization of the child process !? Can I also flush and close all handlers in the child process without affecting the parent one? – SmCaterpillar Mar 14 '15 at 09:52
  • @SmCaterpillar anything you do shouldn't affect the parent process. However, I'd be very wary while doing that kind of magic - the implementation of `logging` can change at any moment. – loopbackbee Mar 16 '15 at 19:01
  • 1
    @SmCaterpillar if your requirement is to be able to log to files in the child processes, you should indicate that in your question too for others to really understand your problem. – Cilyan Mar 22 '15 at 12:12
0

If you need to prevent the logging hierarchy from being inherited in the worker processes, simply do the logging configuration after creating the worker pool. From your example:

pool = mp.Pool(n_cores)
logging.basicConfig(format=format, level=logging.INFO)

Then, nothing will be inherited.

Otherwise, like you said, because of the os.fork(), things will get inherited/duplicated. In this case, your options are reconfiguring logging after creating the pool (see my other answer), or other(s) suggestions/answers.

snapshoe
  • 13,454
  • 1
  • 24
  • 28