How do I log multiple very similar events gracefully in python?

Question

With pythons logging module, is there a way to collect multiple events into one log entry? An ideal solution would be an extension of python's logging module or a custom formatter/filter for it so collecting logging events of the same kind happens in the background and nothing needs to be added in code body (e.g. at every call of a logging function).

Here an example that generates a large number of the same or very similar logging events:

import logging

for i in range(99999): 
    try:
        asdf[i]   # not defined!
    except NameError:
        logging.exception('foo') # generates large number of logging events
    else: pass

# ... more code with more logging ...

for i in range(88888): logging.info('more of the same %d' % i)

# ... and so on ...

So we have the same exception 99999 times and log it. It would be nice, if the log just said something like:

ERROR:root:foo (occured 99999 times)
Traceback (most recent call last):
  File "./exceptionlogging.py", line 10, in <module>
    asdf[i]   # not defined!
NameError: name 'asdf' is not defined

INFO:root:foo more of the same (occured 88888 times with various values)

I guess you would need to add some hook (possibly a custom logging.Formatter sub-class) to catch all record to postpone them and merge the next received if it "looks" the same. — DainDwarf, Dec 04 '15 at 15:02
Exactly, and I was wondering if someone wrote such a thing already. — con-f-use, Dec 04 '15 at 16:34
Well, did you try, then? What does it look like so far? I have not tried something like this myself, but I did a Formatter subclass to add another timer on logs (time since launch of program) — DainDwarf, Dec 07 '15 at 07:04
I did not, at the moment I have very little time on my hand. I just thought, it might be useful for many people, so I put a bounty on it. — con-f-use, Dec 07 '15 at 09:56
I actually think this would be harmful. How would you know when each of the 99999 exceptions occurred over the course of the execution? also, would you write the first one to the logfile and update the counter each time? would you write the last one? but what if there is no last one? what if we write to stdout and not a file? too many open issues here... — Andreas Grapentin, Dec 08 '15 at 08:09
Just wanted to say, I'm having a real hard time figuring out who to award the bounty to. You all were very helpful — con-f-use, Dec 13 '15 at 00:27

score 8 · Answer 1 · edited May 23 '17 at 11:33

You should probably be writing a message aggregate/statistics class rather than trying to hook onto the logging system's singletons but I guess you may have an existing code base that uses logging.

I'd also suggest that you should instantiate your loggers rather than always using the default root. The Python Logging Cookbook has extensive explanation and examples.

The following class should do what you are asking.

import logging
import atexit
import pprint

class Aggregator(object):
    logs = {}

    @classmethod
    def _aggregate(cls, record):
        id = '{0[levelname]}:{0[name]}:{0[msg]}'.format(record.__dict__)
        if id not in cls.logs: # first occurrence
            cls.logs[id] = [1, record]
        else: # subsequent occurrence
            cls.logs[id][0] += 1

    @classmethod
    def _output(cls):
        for count, record in cls.logs.values():
            record.__dict__['msg'] += ' (occured {} times)'.format(count)
            logging.getLogger(record.__dict__['name']).handle(record)

    @staticmethod
    def filter(record):
        # pprint.pprint(record)
        Aggregator._aggregate(record)
        return False

    @staticmethod
    def exit():
        Aggregator._output()



logging.getLogger().addFilter(Aggregator)
atexit.register(Aggregator.exit)

for i in range(99999): 
    try:
        asdf[i]   # not defined!
    except NameError:
        logging.exception('foo') # generates large number of logging events
    else: pass

# ... more code with more logging ...
for i in range(88888): logging.error('more of the same')

# ... and so on ...

Note that you don't get any logs until the program exits.

The result of running it this is:

    ERROR:root:foo (occured 99999 times)
    Traceback (most recent call last):
      File "C:\work\VEMS\python\logcount.py", line 38, in 
        asdf[i]   # not defined!
    NameError: name 'asdf' is not defined
    ERROR:root:more of the same (occured 88888 times)

I like the solution very much Mike, thank you. The only killer is "that you don't get any logs until the program exits." For my current use case that is just fine but a general solution should log the first message right away. — con-f-use, Dec 08 '15 at 10:02
Very nice. Depending on the need, there should be a way around waiting for the exit, e.g. flush after each X logs, or agregate only as long as the same log is repeated (flush when log message changes), or agregate DEBUG logs, but flush on anything higher than that, or all of the above. — zvone, Dec 08 '15 at 23:28
@con-f-use, I see that you have added the 'with various values' to the log matching. One could extend the Aggregator's id matching using something like [difflib.SequenceMatcher](https://docs.python.org/2/library/difflib.html#difflib.SequenceMatcher) and ignoring numbers as junk. — Mike Robins, Dec 09 '15 at 06:32
@MikeRobins this implementation is not thread-safe, and collisions may cause counters to zero when occurred. I'd suggest adding a threading.RLock to any action which changes data on cls.logs dictionary. — redlus, Dec 14 '15 at 08:47
I have had a go at matching logs that differ only in the numbers using difflib. It seems that SequenceMatcher does not generate a close match for small strings that only differ in so called junk. eg difflib.SequenceMatcher(lambda x: x in ".0123456789", log1, log2).ratio(). Perhaps do something more hacky like replace numbers with "" and then compare. — Mike Robins, Dec 14 '15 at 09:28

redlus · Answer 2 · 2015-12-14T00:20:39.793

Your question hides a subliminal assumption of how "very similar" is defined. Log records can either be const-only (whose instances are strictly identical), or a mix of consts and variables (no consts at all is also considered a mix).

An aggregator for const-only log records is a piece of cake. You just need to decide whether process/thread will fork your aggregation or not. For log records which include both consts and variables you'll need to decide whether to split your aggregation based on the variables you have in your record.

A dictionary-style counter (from collections import Counter) can serve as a cache, which will count your instances in O(1), but you may need some higher-level structure in order to write the variables down if you wish. Additionally, you'll have to manually handle the writing of the cache into a file - every X seconds (binning) or once the program has exited (risky - you may lose all in-memory data if something gets stuck).

A framework for aggregation would look something like this (tested on Python v3.4):

from logging import Handler
from threading import RLock, Timer
from collections import defaultdict


class LogAggregatorHandler(Handler):

    _default_flush_timer = 300  # Number of seconds between flushes
    _default_separator = "\t"  # Seperator char between metadata strings
    _default_metadata = ["filename", "name", "funcName", "lineno", "levelname"]  # metadata defining unique log records

    class LogAggregatorCache(object):
        """ Keeps whatever is interesting in log records aggregation. """
        def __init__(self, record=None):
            self.message = None
            self.counter = 0
            self.timestamp = list()
            self.args = list()
            if record is not None:
                self.cache(record)

        def cache(self, record):
            if self.message is None:  # Only the first message is kept
                self.message = record.msg
            assert self.message == record.msg, "Non-matching log record"  # note: will not work with string formatting for log records; e.g. "blah {}".format(i)
            self.timestamp.append(record.created)
            self.args.append(record.args)
            self.counter += 1

        def __str__(self):
            """ The string of this object is used as the default output of log records aggregation. For example: record message with occurrences. """
            return self.message + "\t (occurred {} times)".format(self.counter)

    def __init__(self, flush_timer=None, separator=None, add_process_thread=False):
        """
        Log record metadata will be concatenated to a unique string, separated by self._separator.
        Process and thread IDs will be added to the metadata if set to True; otherwise log records across processes/threads will be aggregated together.
        :param separator: str
        :param add_process_thread: bool
        """
        super().__init__()
        self._flush_timer = flush_timer or self._default_flush_timer
        self._cache = self.cache_factory()
        self._separator = separator or self._default_separator
        self._metadata = self._default_metadata
        if add_process_thread is True:
            self._metadata += ["process", "thread"]
        self._aggregation_lock = RLock()
        self._store_aggregation_timer = self.flush_timer_factory()
        self._store_aggregation_timer.start()

        # Demo logger which outputs aggregations through a StreamHandler:
        self.agg_log = logging.getLogger("aggregation_logger")
        self.agg_log.addHandler(logging.StreamHandler())
        self.agg_log.setLevel(logging.DEBUG)
        self.agg_log.propagate = False

    def cache_factory(self):
        """ Returns an instance of a new caching object. """
        return defaultdict(self.LogAggregatorCache)

    def flush_timer_factory(self):
        """ Returns a threading.Timer daemon object which flushes the Handler aggregations. """
        timer = Timer(self._flush_timer, self.flush)
        timer.daemon = True
        return timer

    def find_unique(self, record):
        """ Extracts a unique metadata string from log records. """
        metadata = ""
        for single_metadata in self._metadata:
            value = getattr(record, single_metadata, "missing " + str(single_metadata))
            metadata += str(value) + self._separator
        return metadata[:-len(self._separator)]

    def emit(self, record):
        try:
            with self._aggregation_lock:
                metadata = self.find_unique(record)
                self._cache[metadata].cache(record)
        except Exception:
            self.handleError(record)

    def flush(self):
        self.store_aggregation()

    def store_aggregation(self):
        """ Write the aggregation data to file. """
        self._store_aggregation_timer.cancel()
        del self._store_aggregation_timer
        with self._aggregation_lock:
            temp_aggregation = self._cache
            self._cache = self.cache_factory()

        # ---> handle temp_aggregation and write to file <--- #
        for key, value in sorted(temp_aggregation.items()):
            self.agg_log.info("{}\t{}".format(key, value))

        # ---> re-create the store_aggregation Timer object <--- #
        self._store_aggregation_timer = self.flush_timer_factory()
        self._store_aggregation_timer.start()

Testing this Handler class with random log severity in a for-loop:

if __name__ == "__main__":
    import random
    import logging

    logger = logging.getLogger()
    handler = LogAggregatorHandler()
    logger.addHandler(handler)
    logger.addHandler(logging.StreamHandler())
    logger.setLevel(logging.DEBUG)

    logger.info("entering logging loop")

    for i in range(25):
        # Randomly choose log severity:
        severity = random.choice([logging.DEBUG, logging.INFO, logging.WARN, logging.ERROR, logging.CRITICAL])
        logger.log(severity, "test message number %s", i)

    logger.info("end of test code")

If you want to add more stuff, this is what a Python log record looks like:

{'args': ['()'],
 'created': ['1413747902.18'],
 'exc_info': ['None'],
 'exc_text': ['None'],
 'filename': ['push_socket_log.py'],
 'funcName': ['<module>'],
 'levelname': ['DEBUG'],
 'levelno': ['10'],
 'lineno': ['17'],
 'module': ['push_socket_log'],
 'msecs': ['181.387901306'],
 'msg': ['Test message.'],
 'name': ['__main__'],
 'pathname': ['./push_socket_log.py'],
 'process': ['65486'],
 'processName': ['MainProcess'],
 'relativeCreated': ['12.6709938049'],
 'thread': ['140735262810896'],
 'threadName': ['MainThread']}

One more thing to think about: Most features you run depend on a flow of several consecutive commands (which will ideally report log records accordingly); e.g. a client-server communication will typically depend on receiving a request, processing it, reading some data from the DB (which requires a connection and some read commands), some kind of parsing/processing, constructing the response packet and reporting the response code.

This highlights one of the main disadvantages of using an aggregation approach: by aggregating log records you lose track of the time and order of the actions that took place. It will be extremely difficult to figure out what request was incorrectly structured if you only have the aggregation at hand. My advice in this case is that you keep both the raw data and the aggregation (using two file handlers or something similar), so that you can investigate a macro-level (aggregation) and a micro-level (normal logging).

However, you are still left with the responsibility of finding out that things have gone wrong, and then manually investe what caused it. When developing on your PC this is an easy enough task; but deploying your code in several production servers makes these tasks cumbersome, wasting a lot of your time. Accordingly, there are several companies developing products specifically for log management. Most aggregate similar log records together, but others incorporate machine learning algorithms for automatic aggregation and learning your software's behavior. Outsourcing your log handling can then enable you to focus on your product, instead of on your bugs.

Disclaimer: I work for Coralogix, one such solution.

Thank you for sharing your insight. As to the >>subliminal assumption of how "very similar" is defined<<: I'm aware of that, and was fully prepared to give up that feature. I just thought someone more capable and motivated might provide a clever solution to that. Turns out I was not wrong. — con-f-use, Dec 09 '15 at 22:02
@con-f-use I've tested the code, fixed it for Python v3.4 and added some mini-features. i.e. configurable flush_timer, flush() method to avoid data-loss upon exit, print the aggregation data through logging.StreamHandler on the screen. — redlus, Dec 14 '15 at 00:24

edd · Answer 3 · 2015-12-08T16:32:55.343

You can subclass the logger class and override the exception method to put your error types in a cache until they reach a certain counter before they are emitted to the log.

import logging
from collections import defaultdict

MAX_COUNT = 99999


class MyLogger(logging.getLoggerClass()):
    def __init__(self, name):
        super(MyLogger, self).__init__(name)
        self.cache = defaultdict(int)

    def exception(self, msg, *args, **kwargs):
        err = msg.__class__.__name__
        self.cache[err] += 1
        if self.cache[err] > MAX_COUNT:
            new_msg = "{err} occurred {count} times.\n{msg}"
            new_msg = new_msg.format(err=err, count=MAX_COUNT, msg=msg)
            self.log(logging.ERROR, new_msg, *args, **kwargs)
            self.cache[err] = None


log = MyLogger('main')
try:
    raise TypeError("Useful error message")
except TypeError as err:
    log.exception(err)

Please note this isn't copy paste code.
You need to add your handlers (I recommend formatter, too) yourself.
https://docs.python.org/2/howto/logging.html#handlers

Have fun.

score 0 · Answer 4 · answered Dec 04 '15 at 15:02

0

Create a counter and only log it for count=1, then increment thereafter and write out in a finally block (to ensure it gets logged no matter how bad the application crashes and burns). This could of course pose an issue if you have the same exception for different reasons, but you could always search for the line number to verify it's the same issue or something similar. A minimal example:

name_error_exception_count = 0
try:
    for i in range(99999): 
        try:
            asdf[i]   # not defined!
        except NameError:
            name_error_exception_count += 1
            if name_error_exception_count == 1:
                logging.exception('foo') 
        else: pass
except Exception:
    pass  # this is just to get the finally block, handle exceptions here too, maybe
finally:
    if name_error_exception_count > 0:
        logging.exception('NameError exception occurred {} times.'.format(name_error_exception_count))

answered Dec 04 '15 at 15:02

Bob Dylan

1,773
2
16
27

That would be the easiest solution and good if I only had that type of multiple exception in one place of my source. A solution that has a custom filter/formatter for the `logging` module would be preferable tough. – con-f-use Dec 04 '15 at 15:18
1

@con-f-use then you need to modify your example to account for a more realistic representation of your code so answers can address what you actually need. For instance, maybe a class or function decorator that handles logging and tracks multiple occurrences is more appropriate, but your example shows neither a function nor a class with methods. – Bob Dylan Dec 04 '15 at 15:22
@con-f-use despite the edited post, it really gives no further insight into the problem. You could of course write your own logger that uses the principles of this answer to avoid logging duplicate exceptions from the same functions/methods, but your example doesn't even make it clear if your code is running from inside a function or class method, which can impact how someone might write the logger or alternate solutions such as a logging decorator, context manager, or other option. – Bob Dylan Dec 07 '15 at 18:41
Ideally it should run from the most general context possible. I was under the impression, that the logging module is a very general module that does run from within classes. – con-f-use Dec 07 '15 at 22:38
1

@con-f-use yes but how it resolves calling functions is another matter. If it's in a class that has subclasses calling it, should the exception [propogate](https://docs.python.org/2/library/logging.html#logging.Logger.propagate)? If it does, how will it know whether it's truly a duplicate without knowing the calling class? How this gets resolved (via `inspect`, for instance) matters based on how you intend to do use it. A context manager is best for all purpose, but you may not need something that complex if you don't even have class methods or functions. – Bob Dylan Dec 07 '15 at 22:53
1

My answer works fine given the complexity shown in your question. [Bad MCV](http://stackoverflow.com/help/mcve), bad answers. – Bob Dylan Dec 07 '15 at 22:54

How do I log multiple very similar events gracefully in python?

4 Answers4