Capture all stdout/stderr within structlog to generate JSON logs

Question

I am currently trying to get away from print()'s and start with centralized log collection using the ELK stack and the structlog module to generate structured json log lines. This is working perfectly fine for modules that I wrote myself using a loggingHelper module that I can import and use with

logger = Logger()

in other modules and scripts. This is the loggingHelper module class:

class Logger:
    """
    Wrapper Class to import within other modules and scripts
    All the config and log binding (script
    """
    def __init__(self):
        self.__log = None
        logging.basicConfig(level=logging.DEBUG, format='%(message)s')
        structlog.configure(logger_factory=LoggerFactory(),
                            processors=[structlog.stdlib.add_log_level,
                            structlog.processors.TimeStamper(fmt="iso"),
                            structlog.processors.JSONRenderer()])
        logger = structlog.get_logger()
        main_script = os.path.basename(sys.argv[0]) if sys.argv[0] else None
        frame = inspect.stack()[1]
        log_invocation = os.path.basename(frame[0].f_code.co_filename)
        user = getpass.getuser()

        """
        Who executed the __main__, what was the executed __main__ file, 
        where did the log event happen?
        """
        self.__log = logger.bind(executedScript = main_script,
                                 logBirth = log_invocation,
                                 executingUser = user)

    def info(self, msg, **kwargs):
        self.__log.info(msg, **kwargs)

    def debug(self, msg, **kwargs):
        self.__log.debug(msg, **kwargs)

    def error(self, msg, **kwargs):
        self.__log.error(msg, **kwargs)

    def warn(self, msg, **kwargs):
        self.__log.warning(msg, **kwargs)

This produces nicely formatted output (one JSON per line) that filebeat is able to read and forward to Elasticsearch. However, third-party librariers completely crush the well-formatted logs.

{"executingUser": "xyz", "logBirth": "efood.py", "executedScript": "logAlot.py", "context": "SELECT displayname FROM point_of_sale WHERE name = '123'", "level": "debug", "timestamp": "2019-03-15T12:52:42.792398Z", "message": "querying local"}
{"executingUser": "xyz", "logBirth": "efood.py", "executedScript": "logAlot.py", "level": "debug", "timestamp": "2019-03-15T12:52:42.807922Z", "message": "query successful: got 0 rows"}
building service object
auth version used is: v4
Traceback (most recent call last):
  File "logAlot.py", line 26, in <module>
    ef.EfoodDataControllerMerchantCenter().get_displayname(123)
  File "/home/xyz/src/toolkit/commons/connectors/efood.py", line 1126, in get_displayname
    return efc.select_from_local(q)['displayname'].values[0]
IndexError: index 0 is out of bounds for axis 0 with size 0

As you can see both info level and error level messages from the third party librara (googleapiclient) are printed without going through the logging processors.

What would be the best way (and most pythonic) of capturing and formatting everything that happens within execution of one script using the loggingHelper module I wrote? Is this even best practice?

Edit: Currently the logger indeed writes to stdout itself, which is then redirected to a file in crontab using >> and 2>&1. This looks like bad practice to me if I want to redirect everything that is written to stdout/stderr by third-party library logging, because this would lead to a loop, correct? Thus my goal is not redirecting, but rather capturing everything in my logging processor. Changed the title accordingly.

Also, here is a rough overview of what I am trying to achieve. I am very open to general criticism and suggestions that diviate from this.

score 2 · Answer 1 · answered Mar 18 '19 at 12:02

First thing first: you should NOT do any logger config (logging.basicConfig, logging.dictConfig etc) in your class initializer - the logging configuration should be done once and only once at process startup. The whole point of the logging module is to completely decouple logging calls

Second point: I'm no structlog expert (and that's an understatement - it's actually the very first time I hear about this package) but the result you get is what was to be expected from your code snippet: only your own code uses structlog, all other libs (stdlib or 3rd part) will still use the stdlib logger and emit plain text logs.

From what I've seen in structlog doc, it seems to provide some way to wrap the stdlib's loggers using the structlog.stdlib.LoggerFactory and add specific formatters to have a more consistant output. I have not tested this (yet) and the official doc is a bit sparse and lacking usable practical example (at least I couldn't find any) but this article seems to have a more explicit example (to be adapted to your own context and use case of course).

CAVEAT : as I said I never used structlog (first time I hear of this lib) so I might have misunderstood some things, and you will of course have to experiment to find out how to properly configure the whole thing to get it to work as expected.

As a side note: in unix-like systems stdout is supposed to be for program's outputs (I mean "expected output" => the program's actual results), while all error / reporting / debugging messages belong to stderr. Unless you have compelling reasons to do otherwise you should try and stick to this convention (at least for command-line tools so you can chain / pipeline them the unix way).

score 2 · Accepted Answer · answered Mar 18 '19 at 13:22

Configuring the `logging` module

As you already figured out, structlog requires configuration of the logging functionality already existing in python.

http://www.structlog.org/en/stable/standard-library.html

logging.basicConfig supports options for stream and filename here

https://docs.python.org/3/library/logging.html#logging.basicConfig.

Either you specify a filename which the logger will create a handle to and direct all its output. Depending on how you are set up maybe this would be the file you normally redirect to

import logging

logging.basicConfig(level=logging.DEBUG, format='%(message)s', filename='output.txt')

Or you can pass a StringIO object to the builder, which you can later read from and then redirect to your wished output destination

import logging
import io

stream = io.StringIO()

logging.basicConfig(level=logging.DEBUG, format='%(message)s', stream=stream)

More about StringIO can be read here

https://docs.python.org/3/library/io.html#io.TextIOBase

As @bruno pointed out in his answer, do not do this in an __init__ as you may end up calling this piece of code several times in the same process.

Capture all stdout/stderr within structlog to generate JSON logs

2 Answers2

Configuring the logging module

Configuring the `logging` module