2

I would like to learn how the regex engine works to disallow a variable lenght lookbehind for the below described technique. This is commonly used to create patterns that has a specific word but mustn't have a previous one.

For example, in this post: RegEx that matches a word that NOT succeeds another one the idea is to match the word cube, only if the word small is not present in the previous 20 character.

So, the answer provided by anubhava to this question was:

.*?small.{0,20}cube|(.*?cube)

And his comment was:

Actually this is simple technique to circumvent the regex engine's capabilities to disallow variable length lookbehind. In this regex we match whatever we don't need using pipe (OR) construct on left hand side and finally leave the right most match in the pipe using a captured group.

I think this technique is very useful but don't know how to use it. I would like to understand how the regex engine works to create this kind of regex. Can anybody give me a hand on this explaining me that?

BTW, don't know if this techinique works in all regex engines, so I've labeled the question with java since I'll use it mainly on it.

Community
  • 1
  • 1
Federico Piazza
  • 30,085
  • 15
  • 87
  • 123
  • this is explained in depth in the top answer to [Regex Pattern to Match, Excluding when… / Except between](http://stackoverflow.com/questions/23589174/regex-pattern-to-match-excluding-when-except-between). In fact I'm going to mark this as a duplicate, since the explanation there is the pinnacle of what SO will offer on the subject. – OGHaza Jul 21 '14 at 22:26
  • Thanks for the link. But have you downvoted because of that?? – Federico Piazza Jul 21 '14 at 23:26

1 Answers1

0

The idea is to prefer a match of the negative over the positive, so if the negative would match, then it does match, and you don’t get the positive match in the first group.

It doesn’t actually work in the same way, though; if it did, regular expression engines would use it.

put the cube on top of the small cube
        ^ should match cube, but doesn’t, since smallcube is preferred
Ry-
  • 218,210
  • 55
  • 464
  • 476