Python re
does not support "leading/starting word boundary" \<
construct (in other regex flavors, also \m
or [[:<:]]
), nor the "closing/trailing word boundary", \>
(in other regex flavors, also \M
or [[:>:]]
).
Note that leading and trailing word boundaries are not supported by most NFA, often referred to as "modern", regex engines. The usual way is to use \b
, as you have already noticed, because it is more convenient.
However, this convenience comes with a price: \b
is a context-depending pattern. This problem has been covered very broadly on SO, here is my answer covering some aspects of \b
, see Word boundary with words starting or ending with special characters gives unexpected results.
So, if you plan to use \<
or \>
, you need to implement them manually like this:
\<
= a position at a word boundary where the char to the right is a word char, i.e. \b(?=\w)
.
\>
= a position at a word boundary where the char to the left is a word char, i.e. \b(?<=\w)
.
This is how these word boundary variants are handled in the PCRE library:
COMPATIBILITY FEATURE FOR WORD BOUNDARIES
In the POSIX.2 compliant library that was included in 4.4BSD Unix, the
ugly syntax [[:<:]]
and [[:>:]]
is used for matching "start of word"
and "end of word". PCRE treats these items as follows:
[[:<:]]
is converted to \b(?=\w)
[[:>:]]
is converted to \b(?<=\w)