1

I am trying to parse log files where some of them are single line logs, some are multiline. The regex I have works fine for single lines but not for multi-lines.

^(?<timestamp>\d+-\d+-\d+T\d+:\d+:\d+\.\d+(\+|-)\d+:\d+)\s+\[(?<severity>\w+)\](?<message>.*)$

This is where the match is failing because it does not detect the string after the new line.

2022-06-27T15:22:35.508+00:00 [Info] New settings received:
{"indexer.settings.compaction.days_of_week":"Sunday,Monday"}

The new line should be included in the "message" group until it detects a new timestamp.

I tried multiple approaches to include the newline to be matched but didn't find any solution yet. I have pasted both log formats in the link: https://regex101.com/r/ftJ3UZ/1.

ZygD
  • 22,092
  • 39
  • 79
  • 102
Xhens
  • 804
  • 3
  • 13
  • 33

2 Answers2

2

If a lookahead is supported, you can put an optional repeating group in the message group checking that the next line does not start with a datelike pattern, or the full timestamp.

^(?<timestamp>\d+-\d+-\d+T\d+:\d+:\d+\.\d+([+-])\d+:\d+)\s+\[(?<severity>\w+)\](?<message>.*(?:\n(?!\d+-\d+-\d+T).*)*)$

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • I keep staring ("stare-ing", not "star-ing") at your regex and fail to see a lookahead. Should I add blindness to my other deficiencies? A small point: not all languages (Ruby being one exception) permit the mixing of named and numbered capture groups. – Cary Swoveland Jul 01 '22 at 00:40
  • 1
    @CarySwoveland I did not know that Ruby not support the mixing of named and numbered capture groups :-) but in that case this part `([+-])` can change to a named group or remove the group at all. The lookahead `(?!` is at the end of the pattern, but it might be out of sight due to the length of it. I very much doubt that you suffering from blindness, or else you would not have been able to type this [short paragraph](https://stackoverflow.com/questions/72822876/password-regex-identifying-how-a-given-regex-works) – The fourth bird Jul 01 '22 at 08:14
  • Hi @Thefourthbird! I have another question. I'm facing some issues when adapting it for Ungreedy. Do you maybe know how to modify/adapt it? Thanks in advance! – Xhens Jul 04 '22 at 10:56
  • @Xhens That depends on what you want to match – The fourth bird Jul 04 '22 at 10:59
  • @Thefourthbird it's the solution you already posted, but I checked the "ungreedy" mode to match the requirements of the environment where it's being run: https://regex101.com/r/XnaB28/1 Or another very similar example for which I've been trying to adapt can be found here: https://regex101.com/r/dzHe6U/1 – Xhens Jul 04 '22 at 14:55
1

It seems this would match:

^(?<timestamp>\d+-\d+-\d+T\d+:\d+:\d+\.\d+(\+|-)\d+:\d+)\s+\[(?<severity>\w+)\](?<message>.*)\n(?:{.*})?

I've removed $ and added \n(?:{.*})? to the end to be able to match optional part inside {} braces.

ZygD
  • 22,092
  • 39
  • 79
  • 102