1

I can't make negative lookbehind assertion work with the python re module if the following pattern allows repetitions:

import re

ok = re.compile( r'(?<!abc)def' )
print( ok.search( 'abcdef' ) ) 
# -> None (ok)
print( ok.search( 'abc def' ) )
# -> 'def' (ok)

nok = re.compile( r'(?<!abc)\s*def' )
print( nok.search( 'abcdef' ) ) 
# -> None (ok)
print( nok.search( 'abc def' ) )
# -> 'def'. Why???

My real case application is that I want to find a match in a file only if the match is not preceded by 'function ':

# Must match
mustMatch = 'x = myFunction( y )'

# Must not match
mustNotMatch = 'function x = myFunction( y )'

# Tried without success (always matches)
tried = re.compile( r'(?<!\bfunction\b)\s*\w+\s*=\s*myFunction' )
print( tried.search( mustMatch  ) ) 
# -> match
print( tried.search( mustNotMatch  ) )
# -> match as well. Why???

Is that a limitation?

jxrossel
  • 81
  • 8

1 Answers1

1

" -> 'def'. Why???"

Well, it's quite logical. Look at your pattern: (?<!abc)\s*def

  • (?<!abc) - Negative lookbehind for places that are not preceded by abc, still generates all but one position in your string
  • \s* - Zero or more spaces
  • def - litally matching def

Thus, returning def as a match. To make more sense of this, here a small representation of the positions that are still valid after the negative lookbehind:

enter image description here

As you can see, still 7 valid positions. And including \s* does not affect anything since * means zero or more.

So first apply what is explained here and then apply a pattern something like: (?<!\bfunction\b\s)\w+\s*=\s*myFunction to retrieve your matches. There may be neater ways though.

JvdV
  • 70,606
  • 8
  • 39
  • 70
  • Thanks for your answer! I naively thought that the "main" pattern, '\s*def', would be sought first and that the match would then be checked against the lookbehind condition. Since the lookbehind pattern must be of fixed with, I don't see how to solve my issue though – jxrossel Apr 07 '20 at 19:04
  • @jxrossel, well one thing you could look at is a check with [`startswith`](https://www.tutorialspoint.com/python/string_startswith.htm) once you retrieve your match. Another option is to first trim your strim from multiple spaces (I guess that's why you have `s*`. So first apply what is explained [here](https://stackoverflow.com/a/1546244/9758194) and then apply a pattern something like: `(^|(?<!\bfunction\b\s))\w+\s*=\s*myFunction` to retrieve your matches. – JvdV Apr 07 '20 at 21:41
  • @jxrossel, done. Good luck with future endeavours. I'll be glad to assist. – JvdV Apr 08 '20 at 07:52
  • As a better alternative, knowing that `function` might only be at the beginning (with indentation): `(?m)^(?!\s*\bfunction\b).*?myFunction`. The negative lookahead allows for non-fixed string length and can be followed by anything of interest. The only limitation I've found is that one cannot skip wrapped lines starting by `function` – jxrossel Apr 08 '20 at 09:54