0

\b✅\b do not match a single emoji: '✅'.

\b\u2B07\b do not match: '⬇️'.

\b-\b do not match '-'.

\bfoo\b certainly match 'foo'.

Why does that happens and what's an alternative to ensure my emoji or any special character is not in the middle of a string

playground: https://regex101.com/r/jRaQuJ/2

Edit: For the record, I think this question because i think it's still useful even somehow duplicated. 1st duplicate marked shows a specific and verbose question while this one is simple short and easy to find. 2nd duplicate is just the definition of \b boundary and someone with my problem would probably need something more specific.

Leonardo Rick
  • 680
  • 1
  • 7
  • 14

1 Answers1

4

You can use the pattern:

(?<!\w)✅(?!\w) 

This uses negative lookarounds to match an emoji with no word characters on either side.

The reason for the matches you asked about is that \b is a zero-width boundary where one side of the boundary is \w (a word character, or [0-9A-Za-z_]) and the other is the beginning or end of the string or \W (a non-word character).

For example, consider the string "foo.":

start of string boundary (zero width)
     |
     |   non-word character
     |   |
     v   v
      foo.
      ^ ^
      | |
word characters

The \b boundary could be used in the regex \bfoo\b and find a match thanks to the boundary between o and . characters and the boundary between the beginning of the string and the character f.

"foobar" does not match \bfoo\b because the second o and b don't satisfy the boundary condition, that is, b isn't a non-word character or end of the string.

The pattern \b-\b does not match the string "-" because "-" isn't a word character. Likewise, emojis are built from non-word characters so they won't respond to the boundary as a word character does as is the case with \bfoo\b.

ggorlen
  • 44,755
  • 7
  • 76
  • 106