1

I want to create a match only if an additional quantity condition is true.

Example (which is fine):

Regex: -(START.*?)_\d+(?=-END)

Input: test-START_one_two_three_4-END

Match Group1: START_one_two_three

Anyways I want to add an additional check that inside the group match, there should be _{3,4} characters. But not followed by each other directly.

So I'd have to create an additional non-capturing group with (?:...). What I tried: looking 4 times for _* until the -END:

(?:(?:_[^_]*){4}-END)

But adding this into the regex won't create a match anymore. Why?

https://regex101.com/r/MHzWBr/2

membersound
  • 81,582
  • 193
  • 585
  • 1,120

2 Answers2

1

You may use a lookahead here:

-(START(?=(?:_[^_]*){3,4}-END).*?)_+\d+(?=-END) 
         ^

See the regex demo

Now, (?=(?:_[^_]*){3,4}-END) is a positive lookahead that makes sure that, immediately to the right of the current location, there is

  • (?:_[^_]*){3,4} - three or four repetitions of _ followed with any 0+ chars other than _
  • -END - a literal -END string.
  • .*?

Note that if you want to match the closest window between -START and -END you need to exclude the . and [^_] from matching the start of the -START and -END patterns:

-(START(?=(?:_(?:(?!-(?:END|START))[^_])*){3,4}-END)(?:(?!-(?:END|START)).)*)_+\d+(?=-END)

See this regex demo

The (?:(?!-(?:END|START)).)* pattern is a tempered greedy token.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    TY! That indeed was my intention, but was probably confused due to the nested non-capturing group (which I don't need indeed). – membersound Feb 04 '19 at 15:51
  • @membersound You need the non-capturing group to quantify a *sequence* of patterns. – Wiktor Stribiżew Feb 04 '19 at 15:52
  • That regex doesn't ensure that the two `-END` are the same, e.g. it'll match `test-START_4-END_one_two_three-END`, but that's not right. --- It would probably be better to *replace* the `.*?` with a pattern that matching the 3-4 times underscore blocks, instead of mixing in a zero-width lookahead, – Andreas Feb 04 '19 at 15:54
  • @Andreas could you give an improvement? Of course I'd only would want to match until the first `-END`, and thus get no match at all. – membersound Feb 04 '19 at 15:56
  • @Andreas Given current conditions, it does not make any difference. `.*?` just matches any 0+ chars other than line break chars and `[^_]` matches any char other than `_`, and that can be restricted easily with `(?:(?!-END).)*`. See https://regex101.com/r/MHzWBr/5 – Wiktor Stribiżew Feb 04 '19 at 15:56
  • @WiktorStribiżew and if I'm sure a `-` could only occur right before the `-END`, could I as well replace `.*?` with `[^-]*?` = match anything but `-`? – membersound Feb 04 '19 at 16:06
  • 1
    @membersound You may even use `[^-]*`, the greedy quantifier will make the engine get to the match quicker. Negating sequences of chars is more difficult and less efficient, so if you can use the negated character class, use it. – Wiktor Stribiżew Feb 04 '19 at 16:07
0

Another option might be to do this without a positive lookahead and repeat 2 - 3 times an underscore followed by 1+ times not an underscore.

You could also turn the positive lookahead at the end into a match.

-(START(?:_[^_]+){2,3})_\d+(?=-END)

Regex demo

That will match:

  • - Match -
  • ( Capturing group
    • START(?:_[^_]+){2,3} Match START and repeat 2-3 times an underscore followed by not an underscore
  • )_\d+ Close group, match _ and 1+ digits
  • (?=-END) Assert what is on the right is -END (Or match -END without the lookahead)
The fourth bird
  • 154,723
  • 16
  • 55
  • 70