12

I've got log lines in the following format and want to extract fields:

[field1: content1] [field2: content2] [field3: content3] ...

I neither know the field names, nor the number of fields.

I tried it with backreferences and the sprintf format but got no results:

match => [ "message", "(?:\[(\w+): %{DATA:\k<-1>}\])+" ] # not working
match => [ "message", "(?:\[%{WORD:fieldname}: %{DATA:%{fieldname}}\])+" ] # not working

This seems to work for only one field but not more:

match => [ "message", "(?:\[%{WORD:field}: %{DATA:content}\] ?)+" ]
add_field => { "%{field}" => "%{content}" }

The kv filter is also not appropriate because the content of the fields may contain whitespaces.

Is there any plugin / strategy to fix this problem?

redevined
  • 772
  • 2
  • 6
  • 23

3 Answers3

11

Logstash Ruby Plugin can help you. :)

Here is the configuration:

input {
    stdin {}
}

filter {
    ruby {
        code => "
            fieldArray = event['message'].split('] [')
            for field in fieldArray
                field = field.delete '['
                field = field.delete ']'
                result = field.split(': ')
                event[result[0]] = result[1]
            end
        "
    }
}

output {
    stdout {
        codec => rubydebug
    }
}

With your logs:

[field1: content1] [field2: content2] [field3: content3]

This is the output:

{
   "message" => "[field1: content1] [field2: content2] [field3: content3]",
  "@version" => "1",
"@timestamp" => "2014-07-07T08:49:28.543Z",
      "host" => "abc",
    "field1" => "content1",
    "field2" => "content2",
    "field3" => "content3"
}

I have try with 4 fields, it also works.

Please note that the event in the ruby code is logstash event. You can use it to get all your event field such as message, @timestamp etc.

Enjoy it!!!

Ban-Chuan Lim
  • 7,840
  • 4
  • 35
  • 52
6

I found another way using regex:

ruby {
    code => "
        fields = event['message'].scan(/(?<=\[)\w+: .*?(?=\](?: |$))/)
        for field in fields
            field = field.split(': ')
            event[field[0]] = field[1]
        end
    "
}
Termininja
  • 6,620
  • 12
  • 48
  • 49
redevined
  • 772
  • 2
  • 6
  • 23
0

I know that this is an old post, but I just came across it today, so I thought I'd offer an alternate method. Please note that, as a rule, I would almost always use a ruby filter, as suggested in either of the two previous answers. However, I thought I would offer this as an alternative.

If there is a fixed number of fields or a maximum number of fields (i.e., there may be fewer than three fields, but there will never be more than three fields), this can be done with a combination of grok and mutate filters, as well.

# Test message is: `[fieldname: value]`
# Store values in [@metadata] so we don't have to explicitly delete them.
grok {
    match => {
        "[message]" => [
            "\[%{DATA:[@metadata][_field_name_01]}:\s+%{DATA:[@metadata][_field_value_01]}\]( \[%{DATA:[@metadata][_field_name_02]}:\s+%{DATA:[@metadata][_field_value_02]}\])?( \[%{DATA:[@metadata][_field_name_03]}:\s+%{DATA:[@metadata][_field_value_03]}\])?"
        ]
    }
}

# Rename the fieldname, value combinations. I.e., if the following data is in the message:
#
#     [foo: bar]
#
# It will be saved in the elasticsearch output as:
#
#    {"foo":"bar"}
#
mutate {
    rename => {
        "[@metadata][_field_value_01]" => "[%{[@metadata][_field_name_01]}]"
        "[@metadata][_field_value_02]" => "[%{[@metadata][_field_name_02]}]"
        "[@metadata][_field_value_03]" => "[%{[@metadata][_field_name_03]}]"
    }
    tag_on_failure => []
}

For those who may not be as familiar with regex, the captures in ()? are optional regex matches, meaning that if there is no match, the expression won't fail. The tag_on_failure => [] option in the mutate filter ensures that no error will be appended to tags if one of the renames fails because there was no data to capture and, as a result, there is no field to rename.

Deacon
  • 3,615
  • 2
  • 31
  • 52