1

I am somewhat new to regexps and I am trying to understand this regexp:

(?<!mix\s|mixe[rds]\s|mixing\s)with(?:out)?

in my opinions is search for with or without if it is not followed by the following words:

  • mix
  • mixer/mixed/mixes
  • mixing

so I was trying to re-write it it as:

(?<!mix(?:e[rnds]|ing)?\s)with(?:out)?

but I get the following error:

  • Lookbehind assertion is not fixed width

I understand how the lookbehind works (it goes back fixed width and then tries to match) but aren't the two regexp inside the lookbehind the same regexp?

(I found some info here, but I am still not clear why in this case it does not work) What's the technical reason for "lookbehind assertion MUST be fixed length" in regex?

Community
  • 1
  • 1
Fabrizio
  • 3,734
  • 2
  • 29
  • 32

1 Answers1

2

It doesn't work in this case because the sub-pattern contains a quantifier ?. When this quantifier is found the regex engine decides that your sub-pattern has no more a fixed length (that is true).

Even if the two sub-patterns are equivalent (but the regex engine ignore that), the fact there is a quantifier makes the pattern analysis to fail.

On the other hand pcre accepts several fixed length sub-patterns separated by pipe.

A classical workaround to avoid this problem with pcre consists to use the \K feature to discard characters previously found from the match result:

(?<!mix)(?:e[rnds]|ing)?\s\Kwith(?:out)?
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • seems like the engine needs a patch. I would understand if I had a `*` or a `+` that makes it impossible to predetermine the length, but a ? means there or not, so the subpatterns can be exploded to find the widths and use it. I am really interested in those info because I am thinking that in my next side project I want to create a regexp generator. you give the words that you want included and excluded and it gives an optimized regexp – Fabrizio May 31 '15 at 12:18
  • @Fabrizio: Allowing `?` or an other quantifier is exactly the same because in this case, you can put any number of `?` (and eventually nested) in the subpattern. "Automatically generating an optimized regex": are you dreaming? – Casimir et Hippolyte May 31 '15 at 12:24
  • @Fabrizio: Maybe using backtracking control verbs can be a good alternative, in particular the combo `(*SKIP)(*FAIL)` or if you use capture groups to choose the branch you want and the one to discard. (the main idea with these two approachs is to consume characters of what you want to avoid) – Casimir et Hippolyte May 31 '15 at 12:30
  • Never heard of such things. How would you apply it in this scenario? – Fabrizio May 31 '15 at 12:52
  • @Fabrizio: take a look at this post: http://stackoverflow.com/questions/24534782/how-do-skip-or-f-work-on-regex – Casimir et Hippolyte May 31 '15 at 13:14