get name of pattern that matched in grok in logstash

Question

If I have a patterns file with a bunch of regex patterns such as the following

A .*foo.*
B .*bar.*
C .*baz.*

and my grok filter looks like the following:

grok {
  patterns_dir => ["/location/of/patterns"]
  match => { "request" => [ "%{A}", "%{B}", "%{C}",] 
 }
}

is there any way to know which one matched. I.e the name of the SYNTAX. I would like to annotate the document with the name of the one that matched

One pattern per grok, with add_tag. – Alain Collins Jun 30 '16 at 17:08 — Alain Collins, Jun 30 '16 at 17:08

pandaadb · Accepted Answer · 2016-09-15T09:52:46.317

what you would usually do is name the matched variables. The syntax for that would be:

(taking your example):

grok {
    patterns_dir => ["/location/of/patterns"]
    match => 
    { 
        "request" => [ "%{A:A}", "%{B:NameOfB}", "%{C:SomeOtherName}",] 
    }
}

Accordingly, the matches of your grok would now be named:

A: A

B: NameOfB

C: SomeOtherName

So in your case you could just name them after the patterns. That should work just fine.

Alternatively (I just tested that with grok debugger) it appears that if you do not name your matched pattern they will default to the name of the pattern (which I think is what you want). The downfall of this is that if you reuse your pattern, the result will be an array of values.

This is the test I ran:

Input:

 Caused by: com.my.application.IOException: null Caused by: com.my.application.IOException: null asd asd

grok:

(.*?)Caused by:%{GREEDYDATA}:%{GREEDYDATA}

Output:

{
  "GREEDYDATA": [
    [
      " com.my.application.IOException: null Caused by: com.my.application.IOException",
      " null asd asd"
    ]
  ]
}

Hope that solves your problems,

Artur

EDIT:

Based on OP's other question here is my approach to solving that issue dynamically.

You will still have to match the names. Decide on a common prefix on how to name your matches. I will base my example on 2 json strings to make this easier:

{"a" : "b", "prefix_patterna" : "", "prefix_patternb" : "bla"}
{"a" : "b", "prefix_patterna" : "sd", "prefix_patternb" : ""}

Note how there are 2 artificial matches, prefix_patterna and prefix_patternb. So, I decided on the prefix "prefix" and I use that to identify which event fields to inspect. (you can grok to also drop empty events if that is something you want).

Then in my filter, I use ruby to iterate through all events to find the one that matched my pattern:

ruby {
    code => "
         toAdd = nil;
         event.to_hash.each { |k,v|
              if  k.start_with?('prefix_') && v.to_s != ''
                  toAdd = k
              end
         }
         if toAdd.to_s != ''
             event['test'] = toAdd
         end
    "
}

All this code does is to check the event keys for the prefix, and see if the value of that field is empty or nil. If it finds the field that has a value, it writes it into a new event field called "test".

Here are my tests:

Settings: Default pipeline workers: 8
Pipeline main started
{"a" : "b", "prefix_patterna" : "sd", "prefix_patternb" : ""}
{
            "message" => "{\"a\" : \"b\", \"prefix_patterna\" : \"sd\", \"prefix_patternb\" : \"\"}",
           "@version" => "1",
         "@timestamp" => "2016-09-15T09:48:29.418Z",
               "host" => "pandaadb",
                  "a" => "b",
    "prefix_patterna" => "sd",
    "prefix_patternb" => "",
               "test" => "prefix_patterna"
}
{"a" : "b", "prefix_patterna" : "", "prefix_patternb" : "bla"}
{
            "message" => "{\"a\" : \"b\", \"prefix_patterna\" : \"\", \"prefix_patternb\" : \"bla\"}",
           "@version" => "1",
         "@timestamp" => "2016-09-15T09:48:36.359Z",
               "host" => "pandaadb",
                  "a" => "b",
    "prefix_patterna" => "",
    "prefix_patternb" => "bla",
               "test" => "prefix_patternb"
}

Note how the first test writes "prefix_patterna" while the second test writes "prefix_patternb".

I hope this solves your issue,

Artur

I want a field to have that captures that result into a new field. e.g, if we look at the example above if we had two different inputs: 1) foo_too 2) boo_too Patterns: FOO foo.* BOO boo.* and I want a field called "too_type" in the output to be either "foo" or "boo" based on which matched. The example you gave would have FOO = foo_too, instead of a common field name have the pattern that matched. Example "too_type" = FOO — Arpan Shah, Sep 15 '16 at 09:11
if one of your matches is the field name, and the other match is the field value, you can use the mutate filter to create a field on your event and reference both. — pandaadb, Sep 15 '16 at 09:14
specific case I am concerned with is a variety of url requests of various patterns and I want a req_type field annotated with what some regex defining what the type of the request is — Arpan Shah, Sep 15 '16 at 09:17
Right - This can be solved with a loop in a ruby filter, however you must have a convention on your names. Essentially every matched pattern will be prefixed like "mypattern_{name}". After you run your grok, you use a ruby filter to iterate through all events and set the type on whichever one is not empty — pandaadb, Sep 15 '16 at 09:29

score 0 · Answer 2 · edited May 23 '17 at 12:19

You can tag the match, (or add fields) by having multiple grok filters as follows.

It doesn't feel elegant, is not very scalable as it is prone to a lot of repetition (not DRY), but seems to be the only way to "flag" matches of complex patterns - especially predefined library patterns.

Note you have to add conditionals to the subsequent filters to avoid them being run too when previous filters have already matched. Otherwise you'll still get _grokparsefailure tags for the later filters. Source

You also need to remove the failure tags of all but the final "else" filter. Otherwise you will get spurious _grokparsefailures e.g. from A when B or C matches. Source

grok {
    patterns_dir => ["/location/of/patterns"]
    match => { "request" => "%{A}"
    add_tag => [ "pattern_A" ]
    add_field => { "pattern" => "A" } # another option
    tag_on_failure => [ ] # prevent false failure tags
}
if ("pattern_A" not in [tags]) {
    grok {
        patterns_dir => ["/location/of/patterns"]
        match => { "request" => "%{B}"
        add_tag => [ "pattern_B" ]
        tag_on_failure => [ ] # prevent false failure tags
     }
}
if (["pattern_A","pattern_B"] not in [tags]) {
    grok {
        patterns_dir => ["/location/of/patterns"]
        match => { "request" => "%{C}"
        add_tag => [ "pattern_C" ]
     }
}

There may be ways to simplify / tune this, but I'm not an expert (yet!).

You can make grok matches optional, that way you don't get parse failures when the pattern does not exist — pandaadb, Sep 15 '16 at 09:22

get name of pattern that matched in grok in logstash

2 Answers2