I am using regex
library to find words that are in between specific other words, for example, I want to match "world" if and only if a greeting precedes it and punctuation follows. To avoid matching word prefixes and suffixes, I added the additional condition [^a-zA-Z]
. However, once I add these, regex
cannot match the word anymore:
>>> import regex
>>> pat = regex.compile("(?<=[^a-zA-Z](hello|hi)\s+)world(?=\s*[!?.][^a-zA-Z])")
>>> list(pat.finditer("hello world!"))
[]
>>> pat = regex.compile("(?<=\b(hello|hi)\s+)world(?=\s*[!?.]\b)")
>>> list(pat.finditer("hello world!"))
[]
>>> pat = regex.compile("(?<=(hello|hi)\s+)world(?=\s*[!?.])")
>>> list(pat.finditer("hello world!"))
[<regex.Match object; span=(6, 11), match='world'>]
How can this be explained? How to make sure to match whole words in the look ahead and behind sections?