2

I would like to apply a Filter to the urllib3 Loggers used in the requests module, so that the sensible info from all log strings would be redacted. For some reason, my filter is not applied to urllib3.connectionpool Logger when it's called by requests.get().

Reproducible example

import logging
import re
import requests


class Redactor(logging.Filter):
    """Filter subclass to redact patterns from logs."""
    redact_replacement_string = "<REDACTED_INFO>"

    def __init__(self, patterns: list[re.Pattern] = None):
        super().__init__()
        self.patterns = patterns or list()

    def filter(self, record: logging.LogRecord) -> bool:
        """
        Overriding the original filter method to redact, rather than filter.
        :return: Always true - i.e. always apply filter
        """
        for pattern in self.patterns:
            record.msg = pattern.sub(self.redact_replacement_string, record.msg)
        return True

# Set log level
urllib_logger = logging.getLogger("urllib3.connectionpool")
urllib_logger.setLevel("DEBUG")

# Add handler
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter("logger name: {name} | message: {message}", style="{"))
urllib_logger.addHandler(handler)

# Add filter
urllib_logger.info("Sensitive string before applying filter: www.google.com")
sensitive_patterns = [re.compile(r"google")]
redact_filter = Redactor(sensitive_patterns)
urllib_logger.addFilter(redact_filter)
urllib_logger.info("Sensitive string after applying filter: www.google.com")

# Perform a request that's supposed to use the filtered logger
requests.get("https://www.google.com")

# Check if the logger has been reconfigured
urllib_logger.info("Sensitive string after request: www.google.com")

The result of this code is that the Handler is applied to all log strings, but the Filter is not applied to log strings emitted by the requests.get() function:

logger name: urllib3.connectionpool | message: Sensitive string before applying filter: www.google.com
logger name: urllib3.connectionpool | message: Sensitive string after applying filter: www.<REDACTED_INFO>.com
logger name: urllib3.connectionpool | message: Starting new HTTPS connection (1): www.google.com:443
logger name: urllib3.connectionpool | message: https://www.google.com:443 "GET / HTTP/1.1" 200 None
logger name: urllib3.connectionpool | message: Sensitive string after request: www.<REDACTED_INFO>.com

What I'm expecting

I would like the sensitive pattern ("google") to be redacted everywhere:

logger name: urllib3.connectionpool | message: Sensitive string before applying filter: www.google.com
logger name: urllib3.connectionpool | message: Sensitive string after applying filter: www.<REDACTED_INFO>.com
logger name: urllib3.connectionpool | message: Starting new HTTPS connection (1): www.<REDACTED_INFO>.com:443
logger name: urllib3.connectionpool | message: https://www.<REDACTED_INFO>.com:443 "GET / HTTP/1.1" 200 None
logger name: urllib3.connectionpool | message: Sensitive string after request: www.<REDACTED_INFO>.com

What I tried

  1. I tried applying the same Filter to "root" Logger, to "urllib3" Logger and to all existing Loggers and get the same result (like this):
all_loggers = [logger for logger in logging.root.manager.loggerDict.values()
               if not isinstance(logger, logging.PlaceHolder)]

for logger in all_loggers:
    logger.addFilter(redact_filter)
  1. I tried applying the Filter to the Handler, not to the Logger, since it seems that the Handler is applied to all log strings. Still no luck.

  2. I know that I could subclass a Formatter and do the redactions in there, but I think formatting and redacting are two different functions and I would like to keep them separately. Also, it would be nice to understand the logic in the logging module that produces the results that I'm getting.

Mart Roben
  • 21
  • 2

2 Answers2

0

That's because the record passed to your filter function is not formatted already. The url you want to redact is in the record.args. You need to apply the filter after the ending message is constructed.

HenriTel
  • 184
  • 10
0

Answer to my own question

As @HenriTel pointed out, Filter is applied before the LogRecord is formatted. Therefore parts of the final log string are in record.args. A possible solution is to apply the redaction action separately to record.msg and record.args.

Here's the filter class I ended up using:

class Redactor(logging.Filter):
    """logging.Filter subclass to redact patterns from logs"""
    redact_replacement_string = "<REDACTED_INFO>"

    def __init__(self, patterns: list[re.Pattern] = None):
        super().__init__()
        self.patterns = patterns or list()

    def filter(self, record: logging.LogRecord) -> bool:
        """
        Overrides the original filter method to redact LogRecords.
        :return: always True - i.e. always apply filter
        """
        record.msg = self.redact(record.msg)
        record.args = self.redact(record.args)
        return True

    def redact_string(self, log_string: str):
        for pattern in self.patterns:
            log_string = pattern.sub(self.redact_replacement_string, log_string)
        return log_string

    def redact_number(self, log_number: (int | float)):
        """Only redact numbers that match a redaction pattern exactly, not part of the number."""
        for pattern in self.patterns:
            if pattern.fullmatch(str(log_number)):
                return 1234567890
        return log_number

    def redact(self, log_object):
        if log_object is None:
            return
        if isinstance(log_object, str):
            return self.redact_string(log_object)
        if isinstance(log_object, (int, float)):
            return self.redact_number(log_object)
        if isinstance(log_object, tuple):
            return tuple(self.redact(value) for value in log_object)
        if isinstance(log_object, dict):
            return {key: self.redact(value) for key, value in log_object.items()}
        # For custom types try to typecast to str -> redact -> typecast back to original type
        # Return original object if typecasting results in an error
        try:
            log_object_type = type(log_object)
            redacted_object = self.redact_string(str(log_object))
            return log_object_type(redacted_object)
        except ValueError:
            return log_object

It's quite messy, because of all the different types that record.msg and record.args can have.

Another option would be to subclass logging.Handler and do the redaction after msg % args is computed.

Mart Roben
  • 21
  • 2