We have multiple applications running under Kubernetes, written in Python, Go, Ruby and Elixer. We are using Fluent Bit to forward all of the logs to AWS Open Search. All of our components write their logs to STDOUT/STDERR. Some components write in JSON format, some in non-JSON text format. In Open Search UI, the full body of the JSON log entries is not parsed into individual fields, we see it with some metedata fields followed by a long json string. Here is an example:
Here is the full content of the log
field copied from OpenSearch UI
2023-01-09T23:41:56.279212506Z stdout F {"level":"WARN","ts":1673307716278.9448,"caller":"internal/internal_task_pollers.go:348","message":"Failed to process workflow task.","Namespace":"ai-platform-dev.59ee7","TaskQueue":"WORKFLOW_QUEUE","WorkerID":"1@workflow-worker-ai-workflow-worker-6c445f59f7-pgn6v@","WorkflowType":"NotesProWorkflow","WorkflowID":"workflow_1169649613530771459_1664751006481316721","RunID":"1ae58130-62d6-4f6a-a6db-8789be13d567","Attempt":12530,"Error":"lookup failed for scheduledEventID to activityID: scheduleEventID: 36, activityID: 36"}
Notice that the log
field extract above has some internal "fields" before the embedded json string starts, I mean this part
2023-01-09T23:41:56.279212506Z stdout F
I am starting to suspect that perhaps this non-JSON start to the log
field causes the es
fluent-bit output plugin to fail to parse/decode the json content, and then es
plugin then does not deliver the sub-fields within the json to OpenSearch.
I am considering using a fluent-bit regex parser to extract only the internal json component of the log string, which I assume would then be parsed as json and forwarded to OpenSearch as individual fields.
I am going to try this PARSER config to use regex to extract just the json part of the log string into a new field called capturedJson and then decode that field as json (idea from https://stackoverflow.com/a/66852383/833960):
[PARSER]
Format regex
Name logging-parser
Regex ^(?<timestamp>.*) (?<stream>.*) .* (?<capturedJson>{.*})$
Decode_Field json capturedJson
Time_Format %FT%H:%M:%S,%L
Time_Key time
The components which log in a non-JSON format look fine in OpenSearch.
How can I configure FluentBit and OpenSearch to get both my json and non-json components to render correctly in OpenSearch?
Here is the current FluentBit config file which is shared by all components:
{
"fluent-bit.conf": "[SERVICE]
Parsers_File /fluent-bit/parsers/parsers.conf
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
DB /var/log/flb_kube.db
Parser docker
Docker_Mode On
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc.cluster.local:443
Merge_Log On
Merge_Log_Key data
Keep_Log On
K8S-Logging.Parser On
K8S-Logging.Exclude On
Buffer_Size 32k
[OUTPUT]
Name es
Match *
AWS_Region us-west-2
AWS_Auth On
Host opensearch.my-domain.com
Port 443
TLS On
Retry_Limit 6
Replace_Dots On
Index my-index-name
AWS_STS_Endpoint https://sts.us-west-2.amazonaws.com
"
}
Here is an extract from the parsers.conf
bash-4.2# cat parsers.conf
...
[PARSER]
Name json
Format json
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# --
# Since Fluent Bit v1.2, if you are parsing Docker logs and using
# the Kubernetes filter, it's not longer required to decode the
# 'log' key.
#
# Command | Decoder | Field | Optional Action
# =============|==================|=================
#Decode_Field_As json log
...
In OpenSearch I see the full log payload in a field called log
it is defined as a
If I do a get on the index in Elastic and look for the log
field, I see:
GET my-index-name
{
}
...
"log" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
...
Should I modify the type of the log
field to be dynamic? Do I also need to change anything in the FluentBit config?
Even my components which normally log as json, will still sometimes emit non-json formatted output to STDERR if for example some error condition has occurred that bypasses the application log processing. Can this also be handled?
We are using:
- FluentBit 1.8.x
- OpenSearch 1.3
I think this is relevant to my issue: https://github.com/microsoft/fluentbit-containerd-cri-o-json-log/blob/main/config.yaml