11

I'm planning to run flask through gunicorn at kubernetes. To get the logging right, I want to output all my logs in json.

Currently I'm testing with minikube and https://github.com/inovex/kubernetes-logging to have fluentd collecting the logs.

I managed to get the errorlogs (tracebacks) properly formated thanks to: JSON formatted logging with Flask and gunicorn

I'm still struggling with the access log format. I specified the following gunicorn access log format:

access_log_format = '{"remote_ip":"%(h)s","request_id":"%({X-Request-Id}i)s","response_code":"%(s)s","request_method":"%(m)s","request_path":"%(U)s","request_querystring":"%(q)s","request_timetaken":"%(D)s","response_length":"%(B)s"}'

And the resulting log is json formated. But the message part (format based on access_log_format) now contains escaped double quotes, and will not be parsed by fluentd / ELK into it's individual fields

{"tags": [], "timestamp": "2017-12-07T11:50:20.362559Z", "level": "INFO", "host": "ubuntu", "path": "/usr/local/lib/python2.7/dist-packages/gunicorn/glogging.py", "message": "{\"remote_ip\":\"127.0.0.1\",\"request_id\":\"-\",\"response_code\":\"200\",\"request_method\":\"GET\",\"request_path\":\"/v1/records\",\"request_querystring\":\"\",\"request_timetaken\":\"19040\",\"response_length\":\"20\"}", "logger": "gunicorn.access"}

Thanks Jpw

Jens-Peter Wand
  • 111
  • 1
  • 1
  • 4

4 Answers4

4

I was searching for something that would be useful in a logging config file. Also, i am not a fan of manually constructing json formats by hand.
Solution: All of of gunicorn's logging arguments are available in the record's args. So let's grab the fields from there and let pythonjsonlogger do the rest of the work for us.

Formatter Class

from pythonjsonlogger.jsonlogger import JsonFormatter, merge_record_extra


class GunicornLogFormatter(JsonFormatter): 
    def add_fields(self, log_record, record, message_dict):
        """
        This method allows us to inject gunicorn's args as fields for the formatter
        """
        super(GunicornLogFormatter, self).add_fields(log_record, record, message_dict)
        for field in self._required_fields:
            if field in self.rename_fields:
                log_record[self.rename_fields[field]] = record.args.get(field)
            else:
                log_record[field] = record.args.get(field)

Sample Logging Config File

version: 1
disable_existing_loggers: False
formatters:
  gunicorn_json:
    (): GunicornLogFormatter
    format: '%(h)s %(r)s %({x-request-id}i)s'
    datefmt: '%Y-%m-%dT%H:%M:%S%z'
    rename_fields:
      levelname: level
      '{x-request-id}i': request_id
      r: request

handlers:
  json-gunicorn-console:
    class: logging.StreamHandler
    level: INFO
    formatter: gunicorn_json
    stream: ext://sys.stdout

loggers:
  gunicorn.access:
    level: INFO
    handlers:
      - json-gunicorn-console

Sample Logs

{"h": "127.0.0.1", "request": "GET /login?next=/ HTTP/1.1", "request_id": null}
{"h": "127.0.0.1", "request": "GET /static/css/style.css HTTP/1.1", "request_id": null}
{"h": "127.0.0.1", "request": "GET /some/random/path HTTP/1.1", "request_id": null}
{"h": "127.0.0.1", "request": "GET /some/random/path HTTP/1.1", "request_id": "123123123123123123"}
will7200
  • 1,496
  • 1
  • 13
  • 23
3

The simplest solution is to change outer single quotes to double quotes and inner double quotes to single quotes as mentioned below.

--access-logformat  "{'remote_ip':'%(h)s','request_id':'%({X-Request-Id}i)s','response_code':'%(s)s','request_method':'%(m)s','request_path':'%(U)s','request_querystring':'%(q)s','request_timetaken':'%(D)s','response_length':'%(B)s'}"

Following is sample log

{'remote_ip':'127.0.0.1','request_id':'-','response_code':'404','request_method':'GET','request_path':'/test','request_querystring':'','request_timetaken':'6642','response_length':'233'}
{'remote_ip':'127.0.0.1','request_id':'-','response_code':'200','request_method':'GET','request_path':'/','request_querystring':'','request_timetaken':'881','response_length':'20'}
Gaurang Shah
  • 11,764
  • 9
  • 74
  • 137
  • This answer was probably not accepted because single-quotes are not valid JSON. However, I wanted to point out that I easily able to swap out the single-quotes with double quotes by using replace: `somestring.replace("'", '"')` – Cognitiaclaeves Apr 05 '19 at 17:47
2

You can escape the double quotes (\") directly in the value of --access-logformat to keep your logs as valid JSON.

So if you're running Gunicorn in a Docker container, your Dockerfile could end with something like:

CMD ["gunicorn",            \
    "-b", "0.0.0.0:5000",   \
    "--access-logfile", "-",\
    "--access-logformat", "{\"remote_ip\":\"%(h)s\",\"request_id\":\"%({X-Request-Id}i)s\",\"response_code\":\"%(s)s\",\"request_method\":\"%(m)s\",\"request_path\":\"%(U)s\",\"request_querystring\":\"%(q)s\",\"request_timetaken\":\"%(D)s\",\"response_length\":\"%(B)s\"}", \
    "app:create_app()"]

Find the rest of the Gunicorn logging options here.

Voicu
  • 16,921
  • 10
  • 60
  • 69
2

It's 2 years on and I'm assuming the fluentd python logger has changed, I get a slightly different issue now and every Google search points back to this discussion.

When using your example in a gunicorn config file

access_log_format = '{"remote_ip":"%(h)s","request_id":"%({X-Request-Id}i)s","response_code":"%(s)s","request_method":"%(m)s","request_path":"%(U)s","request_querystring":"%(q)s","request_timetaken":"%(D)s","response_length":"%(B)s"}'

I get the desired behaviour of reading this as json and merging it with the fluentd json data, however the gunicorn fields aren't populated

{"tags": [], "level": "INFO", "host": "ubuntu", "logger": "gunicorn.access", "remote_ip":"%(h)s","request_id":"%({X-Request-Id}i)s","response_code":"%(s)s","request_method":"%(m)s","request_path":"%(U)s","request_querystring":"%(q)s","request_timetaken":"%(D)s","response_length":"%(B)s"}

It looks like the reason for this is that Gunicorn passes access_log_format as the message to the logger, and all the parameters (safe_atoms) as additional params, e.g.

/gunicorn/glogging.py

        safe_atoms = self.atoms_wrapper_class(
            self.atoms(resp, req, environ, request_time)
        )

        try:
            # safe_atoms = {"s": "200", "m": "GET", ...}
            self.access_log.info(self.cfg.access_log_format, safe_atoms)

however, if FluentRecordFormatter sees the string as valid json, it will read it using json.loads but ignores any arguments that are passed

/fluent/handler.py

    def _format_msg_json(self, record, msg):
        try:
            json_msg = json.loads(str(msg))  # <------- doesn't merge params
            if isinstance(json_msg, dict):
                return json_msg
            else:
                return self._format_msg_default(record, msg)
        except ValueError:
            return self._format_msg_default(record, msg)

Compare this to the default Python formatter which calls record.message = record.getMessage() which in turn merges the arguments in

/Lib/logging/init.py

    def getMessage(self):
        """
        Return the message for this LogRecord.
        Return the message for this LogRecord after merging any user-supplied
        arguments with the message.
        """
        msg = str(self.msg)
        if self.args:
            msg = msg % self.args  # <------ args get merged in
        return msg

I've logged an issue with the fluent-logger-python project.

Workaround

Use a logging filter to perform the merge before it's passed to FluentRecordFormatter.

logger = logging.getLogger('fluent.test')

class ContextFilter(logging.Filter):
    def filter(self, record):
        record.msg = record.msg % record.args
        return True

fluent_handler = handler.FluentHandler('app.follow', host='localhost', port=24224)
formatter = handler.FluentRecordFormatter()
fluent_handler.setFormatter(formatter)
merge_filter = ContextFilter()
fluent_handler.addFilter(merge_filter)
logger.addHandler(fluent_handler)

Edit: Logging Filter Doesn't Work

After using the workaround of a logging filter for a bit, I started receiving errors like

ValueError: unsupported format character ';' (0x3b) at index 166

It turns out that FluentRecordFormatter does call the base getMessage implementation, merging arguments into the message

    def format(self, record):
        # Compute attributes handled by parent class.
        super(FluentRecordFormatter, self).format(record)  # <------ record.messge = record.msg % record.args
        # Add ours
        record.hostname = self.hostname

        # Apply format
        data = self._formatter(record)

        self._structuring(data, record)
        return data

The problem is that _format_msg_json(self, record, msg) uses the record.msg attribute which is the unmerged data, whereas record.message is the merged data. This creates a problem that my logging filter was merging / formatting the data but then the log formatter was also trying to do that and occasionally seeing invalid syntax.

Workaround 2: Don't Use Json

I've entirely given up on outputting json from gunicorn / python logging. Instead I'm using Fluentd's parser to parse the json, e.g.

<filter *.gunicorn.access>
  @type parser
  key_name message
  reserve_time true
  reserve_data true
  remove_key_name_field true
  hash_value_field access_log
  <parse>
    @type regexp
    expression /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<referer>[^\"]*)" "(?<agent>[^\"]*)"$/
    time_format %d/%b/%Y:%H:%M:%S %z
  </parse>
</filter>

You can read about what the options are doing here: https://docs.fluentd.org/filter/parser

phdesign
  • 2,019
  • 21
  • 18