2

Given an input like @1=A1@2=A2@3=A3>>@1=B1@2=B2@3=B3>>@1=C1@2=C2@3=C3>>@1=B1@2=B2@3=B3. I want to capture what is after @2= when @3=B3 and also verify that when @3=B3, then @2= should contain the same value which was captured.

The patterns that should match are:

@1=A1@2=A2@3=A3>>@1=B1@2=B2@3=B3>>@1=C1@2=C2@3=C3>>@1=B1@2=B2@3=B3 @1=A1@2=A2@3=A3>>@1=B1@2=B2@3=B3>>@1=C1@2=C2@3=C3

The pattern that should not match @1=A1@2=A2@3=A3>>@1=B1@2=B2@3=B3>>@1=C1@2=C2@3=C3>>@1=B1@2=B10@3=B3 @1=A1@2=A2@3=A3>>@1=B1@2=B2@3=B3>>@1=C1@2=C2@3=C3>>@1=B1@2=B10@3=B3>>@1=B1@2=B2@3=B3

The way I do this currently is in two passes, first by getting all invalid patterns by using regex @2=((?:\w|-|'|""|,|\.)+?)@3=B3.+@2=(?!\1@)((?:\w|-|'|""|,|\.)+?)@3=B3 and then removing these patterns from all the available inputs.

mohit
  • 4,968
  • 1
  • 22
  • 39

1 Answers1

1

You can use the following regex:

^(?:(?!@2=[^@]*@3=B3(?:[@>]|$)).)*@2=([^@]*)@3=B3(?:[@>]|$)(?!.*@2=(?!\1)[^@]*@3=B3(?:[@>]|$))

Online demo.

How does it work?

First it skips all the text up until the first @2= followed by @3=B3 using a tempered greedy token:

^(?:(?!@2=[^@]*@3=B3(?:[@>]|$)).)*

Then it captures the value of the @2=:

@2=([^@]*)@3=B3(?:[@>]|$)

And finally it uses a negative lookahead assertion to make sure that no other @2= followed by a @3=B3 has a different value than the captured one:

(?!.*@2=(?!\1)[^@]*@3=B3(?:[@>]|$))
Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
  • Can we avoid ">" after B3. The real input have more dimensions like @4=..@5=......@100=.. – mohit Oct 28 '18 at 19:17
  • @mohit Well, then replace the `>` with `@` or `[>@]` or whatever. – Aran-Fey Oct 28 '18 at 19:18
  • @Aran-Fey - After changing to @, if the @2=B2 is at the last of the string, it does not work. https://regex101.com/r/FGgk33/2. Can you please help – mohit Oct 28 '18 at 19:46
  • @mohit My bad, I overlooked that that could happen. Answer updated. Unfortunately fixing that bug increased the complexity of the regex quite a bit... – Aran-Fey Oct 28 '18 at 20:09