1

here is my regex: https://regex101.com/r/g56UzY/1

i have this pattern

pdlvkw6v INFO  18:25:03.994 pdlvkw6v WARN  18:25:03.994 pdlvkw6v INFO  
18:25:03.994 rg9n9bz7 INFO  18:23:52.987 rg9n9bz7 ERROR  19:23:52.987 
rg9n9bz7 INFO  21:23:52.987 5y6n9bz7 WARN  18:23:52.987

and my current regex is: [\w]{8}\s+(INFO|WARN|ERROR)\s+\d\d:\d\d:\d\d\.\d\d\d

I want the regex to only determine the first unique string ie. show pdlvkw6v and after that it should show me rg9n9bz7 and then 5y6n9bz7, it should not match the repititive strings.

What i am trying is to break events from multiline based on this fixed string and since one event can have multiple string and i want to be able to break it by the first matching string and leave the rest into the event.

Dhinesh
  • 11
  • 3

1 Answers1

0

You need to capture the word you are interested in and add a negative lookahead check:

(?s)\b(\w{8})\b(?!.*\b\1\b)\s+(?:INFO|WARN|ERROR)\s+\d\d(?::\d\d){2}\.\d{3}
    ^^^^^^^^^^^^^^^^^^^^^^^ 

Or, if (?s) modifier is not supported:

\b(\w{8})\b(?![\s\S]*\b\1\b)\s+(?:INFO|WARN|ERROR)\s+\d\d(?::\d\d){2}\.\d{3}

See the regex demo

Explanation:

  • (?s) - a DOTALL modifier making . match any char
  • \b - a word boundary
  • (\w{8}) - Group 1: 8 word chars
  • \b - a word boundary
  • (?!.*\b\1\b) - the negative lookahead that fails the match if immediately to the right of the current location, after 0+ chars, there is a whole word equal to the one stored in the Group 1 buffer
  • \s+ - 1+ whitespaces
  • (?:INFO|WARN|ERROR) - one of the three substrings
  • \s+ - 1+ whitespaces
  • \d\d - 2 digits
  • (?::\d\d){2} - 2 sequences of :, digit, digit
  • \. - a dot
  • \d{3} - three digits
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks for info, but i want to capture the first string of the repititive strings and the second match should be the first of the second string and so on. Hope it makes sense – Dhinesh Jun 30 '17 at 09:59
  • @Dhinesh: Look at your question. The values you need are `pdlvkw6v`, `rg9n9bz7` and `5y6n9bz7`. Exactly what my regex [*captures into Group 1*](https://regex101.com/r/g56UzY/3). If that is not what you need, edit the question. – Wiktor Stribiżew Jun 30 '17 at 10:03
  • Sorry i missed to mention FIRST STRING, correct the question. thanks – Dhinesh Jun 30 '17 at 10:12
  • Not sure I understand. If you do not want *multiple matches*, remove the global modifier: https://regex101.com/r/g56UzY/4. See a lookahead based version: https://regex101.com/r/HzTaum/1. – Wiktor Stribiżew Jun 30 '17 at 10:20
  • @Dhinesh: Does it work now? Please provide the environment you are working in, and the exact output you need (and maybe what exactly you are trying to do in order to get what). – Wiktor Stribiżew Jun 30 '17 at 12:05
  • i have a request from an application where they have multiline event logs. They want them to be broken into events. So the team requested them to do a line_break before the fixed string(fixed string is pdlvkw6v). Initially i broke the events before fixed string as mentioned about, however they want me to break the line before the first fixed string, In my logs lets say i have xyz xyz xyz abc abc wer wer. It should only match from first xyz and the second match will be first abc and the third match should be first wer. In splunk i will say break_before=Myregex so line1 xyz xyz xyz line2 abc abc – Dhinesh Jun 30 '17 at 12:57
  • Is it .NET? You need a variable-width lookbehind to do this with a regex. – Wiktor Stribiżew Jun 30 '17 at 12:58
  • can regex capture a string and skip all the consecutive same string value and the next match should be a new string – Dhinesh Jun 30 '17 at 13:03
  • @Dhinesh: It depends *whcih regex flavor*. Go regex can't do anything. JS can't do a lot. Python is weak. PCRE is cool, and .NET is cool. What is yours? – Wiktor Stribiżew Jun 30 '17 at 13:10
  • Am not sure, am trying to parse events in SPLUNK and using regex to break the events, not sure about the flavor – Dhinesh Jun 30 '17 at 13:19
  • It is written it uses PCRE. Check [this regex demo](https://regex101.com/r/sf8BI3/2) – Wiktor Stribiżew Jun 30 '17 at 13:37
  • this one seems to fufill my requirement, let me quickly apply in my environment and get back to you. Thank you for your effort on this – Dhinesh Jun 30 '17 at 14:25
  • Yeah, may be `(?:.*?\b\1\b)*` will be more efficient (with a `?` after `.*`). – Wiktor Stribiżew Jun 30 '17 at 14:27