in my case, I'm using the latest version of aws-for-fluent-bit V2.15.0 because I want to save the application logs in cloudwatch and this image comes prepared to handle that.
I didn't use the Kubernetes filter because it adds a lot of things that I can see directly in the cluster, I just need the application logs in cloudwatch for the developers. So I use this amazon provide yaml as a base, only using the INPUT tail for container logs and the container_firstline parser.
As you will see, I create my own filter called "parser" that takes the logs and do a regex. My logs are peculiar because we use in some cases JSON embedded, so at the end, I have 2 types of logs, one whit only text and the other with a JSON inside like this 2:
2021-06-09 15:01:26: a5c35b84: block-bridge-5cdc7bc966-cq44r: clients::63 INFO: Message received from topic block.customer.get.info
2021-06-09 15:01:28: a5c35b84: block-bridge-5cdc7bc966-cq44r: block_client::455 INFO: Filters that will be applied (parsed to PascalCase): {"ClientId": 88888, "ServiceNumber": "BBBBBBFA5527", "Status": "AC"}
These 2 types of logs, make me create 2 PARSER type regex, and 1 custom FILTER called parser. The filter matches these 2 types of logs using the parsers (parser_logs and parser_json).
The principal problem was that the JSON part wasn't correctly parsed, always get the JSON part with a backslash(\) to escape the double quotes(") like this:
2021-06-09 15:01:28: a5c35b84: block-bridge-5cdc7bc966-cq44r: block_client::455 INFO: Filters that will be applied (parsed to PascalCase): {\"ClientId\": 88888, \"ServiceNumber\": \"BBBBBBFA5527\", \"Status\": "AC"}
the solution was to add the Decode_Field_As that many people say that is not required. In my case, I need it to remove those backslash(). You will see that I use only for the field "additional_message" where I match exactly the JSON.
Finally, here is my config:
.
.
[INPUT]
Name tail
Tag kube.*
Exclude_Path /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
Path /var/log/containers/*.log
Docker_Mode On
Docker_Mode_Flush 5
Docker_Mode_Parser container_firstline
Parser docker
DB /var/fluent-bit/state/flb_kube.db
Mem_Buf_Limit 10MB
Skip_Long_Lines Off
Refresh_Interval 10
[FILTER]
Name parser
Match kube.*
Key_Name log
Parser parser_json
Parser parser_logs
.
.
.
[PARSER]
Name parser_logs
Format regex
Regex ^(?<time_stamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}): (?<environment>.*?): (?<hostname>.*?): (?<module>.*?)::(?<line>\d+) (?<log_level>[A-Z]+): (?<message>[a-zA-Z0-9 _.,:()'"!¡]*)$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name parser_json
Format regex
Regex ^(?<time_stamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}): (?<environment>.*?): (?<hostname>.*?): (?<module>.*?)::(?<line>\d+) (?<log_level>[A-Z]+): (?<message>[^{]*)(?<message_additional>{.*)$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
Decode_Field_As escaped_utf8 message_additional do_next
Decode_Field_As escaped message_additional do_next
Decode_Field_As json message_additional
[PARSER]
Name container_firstline
Format regex
Regex (?<log>(?<="log":")\S(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
[PARSER]
Name docker
Format json
Time_Key @timestamp
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep Off
One thing to keep in mind is that Decode_Field_As needs that the field that will be decode must be entirely a JSON (start with "{" and end with "}"). If it has text, and then a JSON the decode will fail. That's the reason why I have to create 2 PARSER regex.. to match exactly the JSON in some logs inside one unique field called "message_additional".
Here are my new parsed logs in cloudwatch:
{
"environment": "a5c35b84",
"hostname": "block-bridge-5cdc7bc966-qfptx",
"line": "753",
"log_level": "INFO",
"message": "Message received from topic block.customer.get.info",
"module": "block_client",
"time_stamp": "2021-06-15 10:24:38"
}
{
"environment": "a5c35b84",
"hostname": "block-bridge-5cdc7bc966-m5sln",
"line": "64",
"log_level": "INFO",
"message": "Getting ticket(s) using params ",
"message_additional": {
"ClientId": 88888,
"ServiceNumber": "BBBBBBFA5527",
"Status": "AC"
},
"module": "block_client",
"time_stamp": "2021-06-15 10:26:04"
}