0

This is the original regex pattern using a negating character group:

0[^1]*1[^2]*2[^3]*3

It will match 0 1 2 3 efficiently with any delimiting character between digits using a lazy quantifier, compared to the greedy 0.*1.*2.*3 which becomes inefficient for long test strings.

How can the same efficiency be achieved when instead of single characters, it should match groups of characters, for example for this test string:

zero one two three

I was thinking to use a negative lookahead, but couldn’t get it to work in an efficient way.

In other words, I'm looking for an efficient regex that matches zero one two three where the delimiting characters in between can be any combination of characters of any length, so it should also match for example zero-one, two xxx three.

What's the solution?

(RegEx engine ECMA Script / JavaScript)

Manuel
  • 14,274
  • 6
  • 57
  • 130
  • There is no straightforward way to replace a negated character set with a set of negated strings. Some of that functionality can be mimicked by using lookaround, but that might be unnecessarily complicated. From a higher level perspective, what exactly are you trying to accomplish here? – CAustin May 30 '23 at 02:55
  • The higher level is explained in “In other words,…”. The regex pattern will be composed programmatically, so complexity (readability) is not an issue, only performance is. – Manuel May 30 '23 at 06:39
  • 1
    Sorry if I don't understand what you want, but why not [`zero[\s\S]*?one[\s\S]*?two[\s\S]*?three`](https://regex101.com/r/PYCUn8/2)? It should behave exactly like you desire without over-complication of lookarounds. – markalex May 30 '23 at 06:49
  • And by the way `.*` is greedy. Lazy is `.*?` – markalex May 30 '23 at 07:02
  • How should `zero zero one one` be matched? – InSync May 30 '23 at 09:18
  • @markalex "why not `zero[\s\S]*?one[\s\S]*?two[\s\S]*?three?`" - because from what I understand, `[\s\S]*` is the same as `.*` which has bad performance, compared to a negating character group like `[^...]`. Is that correct? – Manuel May 30 '23 at 13:03
  • @InSync "How should `zero zero one one` be matched?" - `zero zero one one two two three three` should the whole string, the duplicate words should be considered "any character in between", even though they look like the actual words the pattern in looking for. – Manuel May 30 '23 at 13:12
  • 1
    @Manuel, while comparison of `.*?` and `[^...]*` is correct here, it will not be correct for lookaheads. Go to regex101, paste sample of your data and compare yourself (or use links provided under your answer, just change sample data). – markalex May 30 '23 at 13:57
  • @markalex after some regex debugging, it turned out that your suggestion was the best performing one, `zero[\s\S]*?one[\s\S]*?two[\s\S]*?three?`, being a lazy quantifier. However, the debugger also showed that using a group with just 2 options `(...|...)` takes the engine 3 steps in total compared to just 1 step when using a char group `[...]`, which is quite bad, but that's the best it gets I guess. – Manuel May 30 '23 at 16:53

0 Answers0