Say I want to match the presence of the phrase Sortes\index[persons]{Sortes}
in the phrase test Sortes\index[persons]{Sortes} text
.
Using python re
I could do this:
>>> search = re.escape('Sortes\index[persons]{Sortes}')
>>> match = 'test Sortes\index[persons]{Sortes} text'
>>> re.search(search, match)
<_sre.SRE_Match object; span=(5, 34), match='Sortes\\index[persons]{Sortes}'>
This works, but I want to avoid the search pattern Sortes
to give a positive result on the phrase test Sortes\index[persons]{Sortes} text
.
>>> re.search(re.escape('Sortes'), match)
<_sre.SRE_Match object; span=(5, 11), match='Sortes'>
So I use the \b
pattern, like this:
search = r'\b' + re.escape('Sortes\index[persons]{Sortes}') + r'\b'
match = 'test Sortes\index[persons]{Sortes} text'
re.search(search, match)
Now, I don't get a match.
If the search pattern does not contain any of the characters []{}
, it works. E.g.:
>>> re.search(r'\b' + re.escape('Sortes\index') + r'\b', 'test Sortes\index test')
<_sre.SRE_Match object; span=(5, 17), match='Sortes\\index'>
Also, if I remove the final r'\b'
, it also works:
re.search(r'\b' + re.escape('Sortes\index[persons]{Sortes}'), 'test Sortes\index[persons]{Sortes} test')
<_sre.SRE_Match object; span=(5, 34), match='Sortes\\index[persons]{Sortes}'>
Furthermore, the documentation says about \b
Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string.
So I tried replacing the final \b
with (\W|$)
:
>>> re.search(r'\b' + re.escape('Sortes\index[persons]{Sortes}') + '(\W|$)', 'test Sortes\index[persons]{Sortes} test')
<_sre.SRE_Match object; span=(5, 35), match='Sortes\\index[persons]{Sortes} '>
Lo and behold, it works! What is going on here? What am I missing?