1

I am trying to find the strings strictly between one "START" and one "END" string in the sentence below:

END START START bad result START good result 1 END START good result 2 END START START

The requirement is that the regex should return "good result 1" and "good result 2".

I tried the following regex with a negative look-ahead with no success:

START(?!.*START?)(.*)END

I also tried a look-behind with the END character, but there it looks like look-behinds with wild cards are not supported.

denniscmpe
  • 11
  • 1

2 Answers2

1

To rephrase, by strictly between one START and one END you mean that no other START or END can intervene in the match? In that case, and if your regex engine allows lookaheads, you consume characters as long as they are not START or END. Your pattern START(?!.START?)(.)END is on the right track, but you need repetition over the lookahead and the dot:

START((?:(?!START|END).)*)END

For instance, using Python:

>>> import re
>>> s = 'END START START bad result START good result 1 END START good result 2 END START START'
>>> re.findall(r'START((?:(?!START|END).)*)END', s)
[' good result 1 ', ' good result 2 ']

The * cannot be directly on the . (i.e., .*) because then the lookahead would only check the first . and it would consume to the end of the string, then backtrack to the final END. Also you cannot put it outside of the capturing group (i.e., ((?!START|END).)*), because then only the last character would be captured. Therefore the repetition happens on the non-capturing group, and the whole thing is inside a capturing group.

If you want to get rid of the spaces after START or before END, add them outside the group: START ((?:(?!START|END).)*) END

goodmami
  • 962
  • 15
  • 26
-1

You may use the following pattern:

(?<=START)(?:(?!START).)+(?=END)

Demo.

Breakdown:

  • (?<=START) - A positive Lookbehind to ensure the match is preceded by "START".
  • (?:(?!START).)+ - A Tempered Greedy Token which matches one or more characters ensuring that "START" is not amongst them.
  • (?=END) - A positive Lookahead to ensure that the match is followed by "END".

If "START" and "END" must be whole words (i.e., to prevent matching "STARTED", for example), we can add some word boundaries (i.e., \b) as follows:

(?<=\bSTART)\b(?:(?!\bSTART\b).)*\b(?=END\b)

Demo.