16

I got the following scenarios:

1) car on the right shoulder
2) car on the left shoulder
3) car on the shoulder

I want to match "shoulder" when left|right is not present. So only 3) return "shoulder"

re.compile(r'(?<!right|right\s*)shoulder')
sre_constants.error: look-behind requires fixed-width pattern

It seems like I can't use \s* and "|"

How can I solve this.

Thanks in advance!

zx81
  • 41,100
  • 9
  • 89
  • 105
Edward Wang
  • 345
  • 1
  • 3
  • 13
  • 1
    When I needed to use the regex `r'(?<=^i?bs=).+'`, I ended up going with `r'^i?bs=(.+)'` and just accessed the first group instead. Sometimes it's pretty easy to work around this limitation. – JamesTheAwesomeDude Apr 22 '21 at 23:19
  • Just to clarify the comment above, which was the solution for me: instead of using a look-behind, add the expression in the main statement, but surround the portion you want to extract with parenthesis to create a match group. It is available as `match.group(1)`, index zero being the whole match. – Felix Oct 07 '21 at 07:18

3 Answers3

26

regex module: variable-width lookbehind

In addition to the answer by HamZa, for any regex of any complexity in Python, I recommend using the outstanding regex module by Matthew Barnett. It supports infinite lookbehind—one of the few engines to do so, along with .NET and JGSoft.

This allows you to do for instance:

import regex
if regex.search("(?<!right |left )shoulder", "left shoulder"):
    print("It matches!")
else:
    print("Nah... No match.")

You could also use \s+ if you wished.

Output:

Nah... No match.
Martina K
  • 3
  • 4
zx81
  • 41,100
  • 9
  • 89
  • 105
  • 1
    +1 for the great regex module link. Do we have something similar available for PHP? – anubhava Jul 28 '14 at 13:59
  • @anubhava Thank you. In PHP I don't know another engine—you know the usual workarounds to infinite lookbehind, `\K` in some cases, capture groups in others. – zx81 Jul 28 '14 at 22:34
  • Thank you very much for the great regex link. regex module supports infinite lookbehind/lookahead. Wow. – Reman Feb 24 '16 at 20:00
  • 1
    Shouldn't you be using positive lookbehind with <= rather than negative lookbehind with – Akshay Jun 12 '18 at 07:05
2

In most regex engines, lookbehinds needs to be of fixed width. This means you can't use quantifiers in a lookbehind in Python +*?. The solution is to move \s* outside your lookbehind:

(?<!left|right)\s*shoulder

You will notice that this expression matches every combination. So we need to change the quantifier from * to +:

(?<!left|right)\s+shoulder

The only problem with this solution is that it won't find shoulder if it's at the beginning of the string, so we might add an alternative with an anchor:

^shoulder|(?<!left|right)\s+shoulder

If you want to get rid of the whitespaces just use the strip function.

Online demo

HamZa
  • 14,671
  • 11
  • 54
  • 75
  • regex.compile(r'(?<!left|right)\s+shoulder').findall("this is right shoulder") [' shoulder'] this still returns shoulder. I guess I have to use regex module with infinet lookbehind – Edward Wang Jul 28 '14 at 02:58
  • @EdwardWang Not sure what's going on on your side but here [it works as expected](http://i.stack.imgur.com/LtZ8x.png) – HamZa Jul 28 '14 at 03:31
  • @HamZa you misunderstood my question. When right/left presents before "shoulder", I expect not match.But your solution still returns shoulder back, and it's nothing different than r'/bshoulder/b' – Edward Wang Jul 28 '14 at 13:07
  • @EdwardWang Have you checked the image I linked? It doesn't return "shoulder" when there is "right" or "left" behind it. – HamZa Jul 28 '14 at 13:46
0

The need for variable width look-behind can be avoided by combining a fixed-width positive look-behind with a negative look-ahead:

re.split('(?<=[\u4e00-\u9fff])(?![\u4e00-\u9fff])', '缩头乌龟suō tóu wūguī', 1)
# >>> Out[47]: ['缩头乌龟', 'suō tóu wūguī']
ccpizza
  • 28,968
  • 18
  • 162
  • 169