2

My regex parser doesn't seem to work. I'm guessing it has something to do with the logs coming from Docker and not being escaped. But I can't get it to work even if I include the Docker parser first.

I've checked it in rubular: https://rubular.com/r/l6LayuI7MQWIUL

fluent-bit.conf

[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    debug
    Parsers_File parsers.conf

[INPUT]
    Name         forward
    Listen       0.0.0.0
    Port         24224

[FILTER]
    Name         grep
    Match        *
    Regex        log ^.*{.*}$

[FILTER]
    Name         parser
    Match        *
    Key_Name     log
    Parser       springboot

[OUTPUT]
    Name stdout
    Match *

parsers.conf

[PARSER]
    Name        springboot
    Format      regex
    Regex       (?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}.\d{1,3}) (?<level>[^ ]*) (?<number>\d*) --- (?<thread>\[[^ ]*) (?<logger>[^ ]*) *: (?<message>[^ ].*)$
    Time_Key    time
    Time_Format %Y-%m-%d %H:%M:%S.%L

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On
    # Command      |  Decoder | Field | Optional Action
    # =============|==================|=================
    Decode_Field_As   escaped    log

stout output

[0] docker-container: [1584997354.000000000, {"log"=>"2020-03-23 21:02:34.077 TRACE 1 --- [nio-8080-exec-1] org.zalando.logbook.Logbook              : {...}", "container_id"=>"5a1251dcf9de3f0e2b8b7b0bce1d35d9c9285726b477606b6448c7ce9e818511", "container_name"=>"/xxxx", "source"=>"stdout"}]

Thanks

Quinten Scheppermans
  • 954
  • 1
  • 10
  • 29

2 Answers2

1

in my case, I'm using the latest version of aws-for-fluent-bit V2.15.0 because I want to save the application logs in cloudwatch and this image comes prepared to handle that.

I didn't use the Kubernetes filter because it adds a lot of things that I can see directly in the cluster, I just need the application logs in cloudwatch for the developers. So I use this amazon provide yaml as a base, only using the INPUT tail for container logs and the container_firstline parser.

As you will see, I create my own filter called "parser" that takes the logs and do a regex. My logs are peculiar because we use in some cases JSON embedded, so at the end, I have 2 types of logs, one whit only text and the other with a JSON inside like this 2:

2021-06-09 15:01:26: a5c35b84: block-bridge-5cdc7bc966-cq44r: clients::63 INFO: Message received from topic block.customer.get.info

2021-06-09 15:01:28: a5c35b84: block-bridge-5cdc7bc966-cq44r: block_client::455 INFO: Filters that will be applied (parsed to PascalCase): {"ClientId": 88888, "ServiceNumber": "BBBBBBFA5527", "Status": "AC"}

These 2 types of logs, make me create 2 PARSER type regex, and 1 custom FILTER called parser. The filter matches these 2 types of logs using the parsers (parser_logs and parser_json).

The principal problem was that the JSON part wasn't correctly parsed, always get the JSON part with a backslash(\) to escape the double quotes(") like this:

2021-06-09 15:01:28: a5c35b84: block-bridge-5cdc7bc966-cq44r: block_client::455 INFO: Filters that will be applied (parsed to PascalCase): {\"ClientId\": 88888, \"ServiceNumber\": \"BBBBBBFA5527\", \"Status\": "AC"}

the solution was to add the Decode_Field_As that many people say that is not required. In my case, I need it to remove those backslash(). You will see that I use only for the field "additional_message" where I match exactly the JSON.

Finally, here is my config:

.
.
    [INPUT]
        Name                tail
        Tag                 kube.*
        Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
        Path                /var/log/containers/*.log
        Docker_Mode         On
        Docker_Mode_Flush   5
        Docker_Mode_Parser  container_firstline
        Parser              docker
        DB                  /var/fluent-bit/state/flb_kube.db
        Mem_Buf_Limit       10MB
        Skip_Long_Lines     Off
        Refresh_Interval    10
    [FILTER]
        Name                parser
        Match               kube.*
        Key_Name            log
        Parser              parser_json
        Parser              parser_logs
.
.
.
    [PARSER]
        Name   parser_logs
        Format regex
        Regex  ^(?<time_stamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}): (?<environment>.*?): (?<hostname>.*?): (?<module>.*?)::(?<line>\d+) (?<log_level>[A-Z]+): (?<message>[a-zA-Z0-9 _.,:()'"!¡]*)$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z
    [PARSER]
        Name   parser_json
        Format regex
        Regex  ^(?<time_stamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}): (?<environment>.*?): (?<hostname>.*?): (?<module>.*?)::(?<line>\d+) (?<log_level>[A-Z]+): (?<message>[^{]*)(?<message_additional>{.*)$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z
        Decode_Field_As   escaped_utf8       message_additional    do_next
        Decode_Field_As   escaped            message_additional    do_next
        Decode_Field_As   json               message_additional
    [PARSER]
        Name        container_firstline
        Format      regex
        Regex       (?<log>(?<="log":")\S(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%LZ
    [PARSER]
        Name        docker
        Format      json
        Time_Key    @timestamp
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   Off

One thing to keep in mind is that Decode_Field_As needs that the field that will be decode must be entirely a JSON (start with "{" and end with "}"). If it has text, and then a JSON the decode will fail. That's the reason why I have to create 2 PARSER regex.. to match exactly the JSON in some logs inside one unique field called "message_additional".

Here are my new parsed logs in cloudwatch:

{
    "environment": "a5c35b84",
    "hostname": "block-bridge-5cdc7bc966-qfptx",
    "line": "753",
    "log_level": "INFO",
    "message": "Message received from topic block.customer.get.info",
    "module": "block_client",
    "time_stamp": "2021-06-15 10:24:38"
}

{
    "environment": "a5c35b84",
    "hostname": "block-bridge-5cdc7bc966-m5sln",
    "line": "64",
    "log_level": "INFO",
    "message": "Getting ticket(s) using params ",
    "message_additional": {
        "ClientId": 88888,
        "ServiceNumber": "BBBBBBFA5527",
        "Status": "AC"
    },
    "module": "block_client",
    "time_stamp": "2021-06-15 10:26:04"
}

  • Hi Angel, thanks for the detailed answer. Any chance you can have a look at this similar issue: https://stackoverflow.com/questions/75052291/how-to-configure-fluentbit-opensearch-so-that-json-and-non-json-logs-are-handl – MattG Jan 10 '23 at 05:23
0
  • Make sure you are using the latest version of Fluent Bit (v1.3.11)
  • Remove the Decode_Field_As entry from your parsers.conf, is not longer required.
edsiper
  • 398
  • 1
  • 4