4

Update: TimedRotatingFileHandler is not working properly when I using multiprocessing, what am I supposed to do with multiprocessing logging?

I wrote my own Logger class as below, use it as a module in all other python scripts.

import logging
import logging.handlers

class Logger:
    DEFAULT_LOG_OUTPUT = "/home/haifzhan/"

    def __init__(self, logger_name, log_file_name, log_dir=DEFAULT_LOG_OUTPUT, log_level=logging.DEBUG):
        self.logger = logging.getLogger(logger_name,)
        self.formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')

        #self.file_handler = logging.FileHandler(log_dir + log_file_name)
        file_path = log_dir + log_file_name
        self.file_handler = logging.handlers.TimedRotatingFileHandler(file_path, when='H', backupCount=30)
        self.file_handler.setFormatter(self.formatter)
        self.logger.setLevel(log_level)
        self.logger.addHandler(self.file_handler)

        self.console_handler = logging.StreamHandler()
        self.console_handler.setFormatter(self.formatter)
        self.console_handler.setLevel(logging.DEBUG)
        self.logger.addHandler(self.console_handler)

    def get_logger(self):
        return self.logger

At the top of my python script, I create an instance of Logger.

 `logger = Logger("logger name", "logfile.log", log_dir=LOG_DIR, log_level=logging.INFO).get_logger()`  # always put it at the top of my script

It worked perfectly when I was using FileHandler, unfortunately it omits logging lines after I switch to TimedRotatingFileHandler. Log file rotation works as it is supposed to, but not logging all lines. The console logging is working fine, how can that be?

self.file_handler = logging.FileHandler(log_dir + log_file_name)

self.file_handler = logging.handlers.TimedRotatingFileHandler(file_path, when='H', backupCount=30)

Can anyone help to solve it?

Haifeng Zhang
  • 30,077
  • 19
  • 81
  • 125
  • I am not sure whether that's the reason, but the console_handler always has a log level of DEBUG, while the file_handler gets the log level INFO from when you create an instance of Logger. So file_handler will not output logs with a log level below INFO. – azalea Dec 09 '15 at 19:03
  • @azalea I did that in purpose, I set the console log level to DEBUG, therefore it can shows me more information which doesnt need to be logged in file. The file logging level can be set by user. – Haifeng Zhang Dec 09 '15 at 19:06
  • Your code seems to be working as expected (no omitted logging lines) when I add `logger = Logger(__name__, 'out', '/tmp/').get_logger()` and `logger.info('test')`. Can you modify your code so we can reproduce the problem? – unutbu Dec 09 '15 at 19:10
  • @unutbu the `logger` instance was used by `multiprocessing` processes, is it possible the reason why lines were not logged? I read python `logging` document, it says logging is thread-safe... – Haifeng Zhang Dec 09 '15 at 21:19
  • 1
    @haifzhan: That could be the problem. Multiple processes writing to the same file could produce garbled or perhaps missing text. You need a way for all the logging messages to be written to the file sequentially. One way to do that is to pass a `mp.Queue` -- the same `Queue` -- to each process, and `put` the logging messages in the queue. Spawn a new process whose sole job is to `get` messages from the Queue and then log them. Thus there would be only one instance of `Logger`. It would be instantiated in this single process and it would handle all the logging. – unutbu Dec 09 '15 at 21:37
  • @unutbu this is a good idea. Unfortunately it cannot distinguish the logging levels :( – Haifeng Zhang Dec 09 '15 at 21:50
  • You can pass a tuple through the queue. The tuple can contain both the logging level and the message. – unutbu Dec 09 '15 at 21:58
  • @unutbu You should convert that comment into an answer (I had just said the same in an answer because I failed to read all the comments, already deleted it). – MariusSiuram Jan 26 '16 at 09:14
  • @MariusSiuram: I like your answer; please undelete so it can be upvoted. – unutbu Jan 26 '16 at 13:01

2 Answers2

6

Don't use the file from all the processes. Instead, make some Queue of log messages and have one dedicated process (the main or an special one) perform the logging.

This should remove the race conditions between processes and those problems.

Given that you have already set a Logger class, the implementation should be quite easy. You can have a global / singleton instance of the logging Queue (where every Logger instance put their logs) and manage the actual logging matters from an single central process.


Edit: A possible approach would be to use a special handler:

class QueueLogger(Handler):
    def __init__(self, log_queue):
        """
        Initialize the handler with logging Queue.
        """
        Handler.__init__(self)
        self.log_queue = log_queue

    def emit(self, record):
        self.log_queue.put(record)

That would allow to put the record (which contain log levels and extra information) in a queue. On the other side you can have a HubLogger which would do something like:

while True:
    r = log_queue.get()
    my_handler.emit(r)

And my_handler can be a TimedRotatingFileHandler or whatever handler you want.


Credit to @unutbu, as they had already commented the "Hub" approach in the comments.

MariusSiuram
  • 3,380
  • 1
  • 21
  • 40
  • Thanks for your reply +1 ! what does `my_handler.emit(r)` do here? why put record into `log_queue` again? – Haifeng Zhang Jan 26 '16 at 17:22
  • Each process ("worker") has a Logger which uses the QueueLogger handler (associated by using the `addHandler` method, for instance). The `emit` method is the mechanism used by logging Handler classes in order to "do something" with the log message (this message will contain the log level, the message, and extra data). The idea is sending this whole parameter to a "master" process through a queue. I say a Queue because multiprocessing.Queue is the standard in this situations. – MariusSiuram Jan 26 '16 at 21:22
  • thanks MariusSiuram. why do we need `self.log_queue.put(record)` in `emit`? Are we supposed to `get` the record from the queue and write to a log? – Haifeng Zhang Jan 26 '16 at 22:04
  • @haifzhan Yes, the while true does that, in a special standalone process (only one per application) – MariusSiuram Jan 26 '16 at 22:08
  • what I am saying is how do you output the record to a file when you put it into a queue. – Haifeng Zhang Jan 26 '16 at 22:16
  • @haifzhan All logging handlers have an emit method. Use whatever handler you desire (I used `my_handler`, you seemed to want to use an instance of `TimedRotatingFileHandler`) and it will work. – MariusSiuram Jan 26 '16 at 22:25
0

You will need to have a separate log file per process.

boxed
  • 3,895
  • 2
  • 24
  • 26
  • In my situation, it would be a nightmare... I have 4 similar programs and each of them are multiprocessing, it would generate more than 40 logs per day – Haifeng Zhang Jan 21 '16 at 16:04
  • "need" is a strong word. However, I must say that having a separate log file per process would have been my first answer. – MariusSiuram Jan 26 '16 at 09:12