0

I have this regex, it matches a certain word if a list of word are not found:

(?<![Yy]ellow |[Bb]lue |[Rr]ed |[Pp]urple |[Bb]lack )[Cc]olor

It works, but I get the Regex Expression Error: "look-behind requires fixed-width pattern". I do not have access to the python code. I tried delimiting by (?:^ and |$) as I saw in a similar question but it didn't work. I also found this answer that I think solves the problem but I don't understand how to make it work in my case.

Test HERE

Dark Color          #match
light color         #match
Blue Color
red color
that is a blue color and theme
opaque color        #match
purple color
cy614
  • 61
  • 6
  • See *Negative lookbehinds can be just concatenated (e.g. `(?<!^|,)"(?!,|$)` should look like `(?<!^)(?<!,)"(?!,|$))`.* in the [answer you referred to](https://stackoverflow.com/a/40617321/16958187). – Wiktor Stribiżew Oct 02 '22 at 11:35

2 Answers2

2

You can split up the alternatives in the lookbehind with separated loopbehind assertions if you want a match only.

If you don't want partial word matches, you can start with a word boundary \b

\b(?<![Yy]ellow )(?<![Bb]lue )(?<![Rr]ed )(?<![Pp]urple )(?<![Bb]lack )[Cc]olor

See a regex demo.

As suggested by @ bobble bubble you can prevent the negative lookaheads from firing by first asserting or matching that the next character is a C or c char.

\b(?=[Cc])(?<![Yy]ellow )(?<![Bb]lue )(?<![Rr]ed )(?<![Pp]urple )(?<![Bb]lack ).olor

See a regex demo asserting or a regex demo matching the first character..


If you have no control over the Python code, you might check if matching what you don't want and returning a capture group for what you do want also works:

\b(?:[Yy]ellow |[Bb]lue |[Rr]ed |[Pp]urple |[Bb]lack )[Cc]olor|([Cc]olor)

See a third regex demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 1
    Sidenote @cy614 that it will be more efficient if a word boundary can be [`used at start`](https://regex101.com/r/swafl8/1). If this a large text and there are many lookbehinds, the starting point can be further defined, eg something like [`\b(?=[Cc])`(?<!many lookbehinds)`.olor`](https://regex101.com/r/B6HIj6/1) but this was not asked. :) – bobble bubble Oct 02 '22 at 10:57
  • 1
    @bobblebubble Ow that is very smart and much more performant indeed. – The fourth bird Oct 02 '22 at 10:59
  • @bobblebubble Is it ok if I add that last pattern from you comment to the answer? You can always post it yourself :-) – The fourth bird Oct 02 '22 at 11:05
  • Sure if you like, I'd be happy! To mention that I had tested that with Python. In PCRE2 at least it seems such modification would not make much difference (even a bit slower). – bobble bubble Oct 02 '22 at 11:45
2

I would get around the fixed width lookbehind problem by instead phrasing your regex using a negative lookahead, which doesn't have this limitation:

\b(?!yellow|blue|red|purple|black)\w+ color\b

Here is a working Python script:

inp = """Dark Color
light color
Blue Color
red color
that is a blue color and theme
opaque color
purple color"""

matches = re.findall(r'\b(?!yellow|blue|red|purple|black)\w+ color\b', inp, flags=re.I)
print(matches)  # ['Dark Color', 'light color', 'opaque color']
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360