From my understanding,
(.)(?<!\1)
should never match. Actually, php's preg_replace
even refuses to compile this and so does ruby's gsub
. The python re
module seems to have a different opinion though:
import re
test = 'xAAAAAyBBBBz'
print (re.sub(r'(.)(?<!\1)', r'(\g<0>)', test))
Result:
(x)AAAA(A)(y)BBB(B)(z)
Can anyone provide a reasonable explanation for this behavior?
Update
This behavior appears to be a limitation in the re
module. The alternative regex
module seems to handle groups in assertions correctly:
import regex
test = 'xAAAAAyBBBBz'
print (regex.sub(r'(.)(?<!\1)', r'(\g<0>)', test))
## xAAAAAyBBBBz
print (regex.sub(r'(.)(.)(?<!\1)', r'(\g<0>)', test))
## (xA)AAA(Ay)BBB(Bz)
Note that unlike pcre
, regex
also allows variable-width lookbehinds:
print (regex.sub(r'(.)(?<![A-Z]+)', r'(\g<0>)', test))
## (x)AAAAA(y)BBBB(z)
Eventually, regex
is going to be included in the standard library, as mentioned in PEP 411.