I would like to apply a Filter to the urllib3 Loggers used in the requests module, so that the sensible info from all log strings would be redacted. For some reason, my filter is not applied to urllib3.connectionpool Logger when it's called by requests.get()
.
Reproducible example
import logging
import re
import requests
class Redactor(logging.Filter):
"""Filter subclass to redact patterns from logs."""
redact_replacement_string = "<REDACTED_INFO>"
def __init__(self, patterns: list[re.Pattern] = None):
super().__init__()
self.patterns = patterns or list()
def filter(self, record: logging.LogRecord) -> bool:
"""
Overriding the original filter method to redact, rather than filter.
:return: Always true - i.e. always apply filter
"""
for pattern in self.patterns:
record.msg = pattern.sub(self.redact_replacement_string, record.msg)
return True
# Set log level
urllib_logger = logging.getLogger("urllib3.connectionpool")
urllib_logger.setLevel("DEBUG")
# Add handler
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter("logger name: {name} | message: {message}", style="{"))
urllib_logger.addHandler(handler)
# Add filter
urllib_logger.info("Sensitive string before applying filter: www.google.com")
sensitive_patterns = [re.compile(r"google")]
redact_filter = Redactor(sensitive_patterns)
urllib_logger.addFilter(redact_filter)
urllib_logger.info("Sensitive string after applying filter: www.google.com")
# Perform a request that's supposed to use the filtered logger
requests.get("https://www.google.com")
# Check if the logger has been reconfigured
urllib_logger.info("Sensitive string after request: www.google.com")
The result of this code is that the Handler is applied to all log strings, but the Filter is not applied to log strings emitted by the requests.get()
function:
logger name: urllib3.connectionpool | message: Sensitive string before applying filter: www.google.com
logger name: urllib3.connectionpool | message: Sensitive string after applying filter: www.<REDACTED_INFO>.com
logger name: urllib3.connectionpool | message: Starting new HTTPS connection (1): www.google.com:443
logger name: urllib3.connectionpool | message: https://www.google.com:443 "GET / HTTP/1.1" 200 None
logger name: urllib3.connectionpool | message: Sensitive string after request: www.<REDACTED_INFO>.com
What I'm expecting
I would like the sensitive pattern ("google") to be redacted everywhere:
logger name: urllib3.connectionpool | message: Sensitive string before applying filter: www.google.com
logger name: urllib3.connectionpool | message: Sensitive string after applying filter: www.<REDACTED_INFO>.com
logger name: urllib3.connectionpool | message: Starting new HTTPS connection (1): www.<REDACTED_INFO>.com:443
logger name: urllib3.connectionpool | message: https://www.<REDACTED_INFO>.com:443 "GET / HTTP/1.1" 200 None
logger name: urllib3.connectionpool | message: Sensitive string after request: www.<REDACTED_INFO>.com
What I tried
- I tried applying the same Filter to "root" Logger, to "urllib3" Logger and to all existing Loggers and get the same result (like this):
all_loggers = [logger for logger in logging.root.manager.loggerDict.values()
if not isinstance(logger, logging.PlaceHolder)]
for logger in all_loggers:
logger.addFilter(redact_filter)
I tried applying the Filter to the Handler, not to the Logger, since it seems that the Handler is applied to all log strings. Still no luck.
I know that I could subclass a Formatter and do the redactions in there, but I think formatting and redacting are two different functions and I would like to keep them separately. Also, it would be nice to understand the logic in the logging module that produces the results that I'm getting.