I am currently trying to get away from print()'s and start with centralized log collection using the ELK stack and the structlog module to generate structured json log lines. This is working perfectly fine for modules that I wrote myself using a loggingHelper module that I can import and use with
logger = Logger()
in other modules and scripts. This is the loggingHelper module class:
class Logger:
"""
Wrapper Class to import within other modules and scripts
All the config and log binding (script
"""
def __init__(self):
self.__log = None
logging.basicConfig(level=logging.DEBUG, format='%(message)s')
structlog.configure(logger_factory=LoggerFactory(),
processors=[structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.JSONRenderer()])
logger = structlog.get_logger()
main_script = os.path.basename(sys.argv[0]) if sys.argv[0] else None
frame = inspect.stack()[1]
log_invocation = os.path.basename(frame[0].f_code.co_filename)
user = getpass.getuser()
"""
Who executed the __main__, what was the executed __main__ file,
where did the log event happen?
"""
self.__log = logger.bind(executedScript = main_script,
logBirth = log_invocation,
executingUser = user)
def info(self, msg, **kwargs):
self.__log.info(msg, **kwargs)
def debug(self, msg, **kwargs):
self.__log.debug(msg, **kwargs)
def error(self, msg, **kwargs):
self.__log.error(msg, **kwargs)
def warn(self, msg, **kwargs):
self.__log.warning(msg, **kwargs)
This produces nicely formatted output (one JSON per line) that filebeat is able to read and forward to Elasticsearch. However, third-party librariers completely crush the well-formatted logs.
{"executingUser": "xyz", "logBirth": "efood.py", "executedScript": "logAlot.py", "context": "SELECT displayname FROM point_of_sale WHERE name = '123'", "level": "debug", "timestamp": "2019-03-15T12:52:42.792398Z", "message": "querying local"}
{"executingUser": "xyz", "logBirth": "efood.py", "executedScript": "logAlot.py", "level": "debug", "timestamp": "2019-03-15T12:52:42.807922Z", "message": "query successful: got 0 rows"}
building service object
auth version used is: v4
Traceback (most recent call last):
File "logAlot.py", line 26, in <module>
ef.EfoodDataControllerMerchantCenter().get_displayname(123)
File "/home/xyz/src/toolkit/commons/connectors/efood.py", line 1126, in get_displayname
return efc.select_from_local(q)['displayname'].values[0]
IndexError: index 0 is out of bounds for axis 0 with size 0
As you can see both info level and error level messages from the third party librara (googleapiclient) are printed without going through the logging processors.
What would be the best way (and most pythonic) of capturing and formatting everything that happens within execution of one script using the loggingHelper module I wrote? Is this even best practice?
Edit: Currently the logger indeed writes to stdout itself, which is then redirected to a file in crontab using >> and 2>&1. This looks like bad practice to me if I want to redirect everything that is written to stdout/stderr by third-party library logging, because this would lead to a loop, correct? Thus my goal is not redirecting, but rather capturing everything in my logging processor. Changed the title accordingly.
Also, here is a rough overview of what I am trying to achieve. I am very open to general criticism and suggestions that diviate from this.