32

I have a remote machine that combines multiline events and sends them across the lumberjack protocol.

What comes in is something that looks like this:

{
     "message" => "2014-10-20T20:52:56.133+0000 host 2014-10-20 15:52:56,036 [ERROR   ][app.logic     ] Failed to turn message into JSON\nTraceback (most recent call last):\n  File \"somefile.py", line 249, in _get_values\n    return r.json()\n  File \"/path/to/env/lib/python3.4/site-packages/requests/models.py\", line 793, in json\n    return json.loads(self.text, **kwargs)\n  File \"/usr/local/lib/python3.4/json/__init__.py\", line 318, in loads\n    return _default_decoder.decode(s)\n  File \"/usr/local/lib/python3.4/json/decoder.py\", line 343, in decode\n    obj, end = self.raw_decode(s, idx=_w(s, 0).end())\n  File \"/usr/local/lib/python3.4/json/decoder.py\", line 361, in raw_decode\n    raise ValueError(errmsg(\"Expecting value\", s, err.value)) from None\nValueError: Expecting value: line 1 column 1 (char 0), Failed to turn message into JSON"
}

When I try to match the message with

grok {         
    match => [ "message", "%{TIMESTAMP_ISO8601:timestamp} \[%LOGLEVEL:loglevel}%{    SPACE}\]\[%{NOTSPACE:module}%{SPACE}\]%{GREEDYDATA:message}" ]
}

the GREEDYDATA is not nearly as greedy as I would like.

So then I tried to use gsub:

mutate {
    gsub => ["message", "\n", "LINE_BREAK"]
}
# Grok goes here
mutate {
    gsub => ["message", "LINE_BREAK", "\n"]
}

but that one didn't work rather than

The Quick brown fox
jumps over the lazy
groks

I got

The Quick brown fox\njumps over the lazy\ngroks

So...

How do I either add the newline back to my data, make the GREEDYDATA match my newlines, or in some other way grab the relevant portion of my message?

Wayne Werner
  • 49,299
  • 29
  • 200
  • 290
  • 2
    Looks like a duplicate of http://stackoverflow.com/questions/24307965/logstash-grok-multiline-message. – Magnus Bäck Oct 21 '14 at 05:40
  • @MagnusBäck basically yes, though that question doesn't care about newlines but I *do* require the newlines to exist in the resulting message. – Wayne Werner Oct 21 '14 at 12:53

3 Answers3

89

All GREEDYDATA is is .*, but . doesn't match newline, so you can replace %{GREEDYDATA:message} with (?<message>(.|\r|\n)*)and get it to be truly greedy.

Alcanzar
  • 16,985
  • 6
  • 42
  • 59
  • 2
    `(?(.|\r|\n)*)` did it! Had 20 tabs open and here I find it in a not so highly upvoted answer. Thank you very much. – bad_keypoints Apr 14 '15 at 04:50
  • 8
    `(.|\r|\n)*` is one of the most misfortunate patterns that are absolute evil as this is performance killer pattern. To match any character with `.`, just use the appropriate modifier, in Oniguruma, it is `(?m)`. In PCRE and PCRE-related flavors, use `(?s)`. In JS, use `[^]` or `[\s\S]` instead of a dot. – Wiktor Stribiżew Oct 19 '16 at 13:02
25

Adding the regex flag to the beginning allows for matching newlines:

match => [ "message", "(?m)%{TIMESTA...
Wayne Werner
  • 49,299
  • 29
  • 200
  • 290
  • Thanks. This also works for things like gsub too, not just grok. Eg. to extract the first line from a Message field (sent from Active Directory) Input: `"Message" => "The computer attempted to validate the credentials for an account.\r\n\r\nAuthentication Package:\tMICROSOFT_AUTHENTICATION_PACKAGE_V1_0\r\n` Code: `gsub => [ "Message", "^(?m)([^\r]*).*", "\1" ]` Output: `"Message" => "The computer attempted to validate the credentials for an account."` – Cameron Kerr Jan 26 '16 at 00:40
1

My final grok for Vertica log using (?m) and [^\n]+

match => ["message","(?m)%{TIMESTAMP_ISO8601:ClientTimestamp}%{SPACE}(%{DATA:Action}:)?(%{DATA:ThreadID} )?(\[%{DATA:Module}\] )?(\<%{DATA:Level}\> )?(\[%{DATA:SubAction}\] )?(@%{DATA:Nodename}:)?( (?<Session>(\{.*?\} )?.*?/.*?): )?(?<message>[^\n]+)((\n)?(\t)?(?<StackTrace>[^\n]+))?"]

Thanks to asperla

https://github.com/elastic/logstash/issues/2282

Sweemyn
  • 11
  • 2