I have this regex
\b(t[úu]s*)\b
And i have this words:
tu (works) tú (doesn't work) tus (works) tús (works)
Why can't I match tú
?
I have this regex
\b(t[úu]s*)\b
And i have this words:
tu (works) tú (doesn't work) tus (works) tús (works)
Why can't I match tú
?
If the regex doesn't match, the two characters differ.
"u with acute" can be expressed as the single Character ú
(U+00FA) or by combining u
(U+0075) with the combining acute accent character (U+0301) which gives a similar looking ú
.
You have to either convert your input string or include both variants in you regular expression, see http://www.regular-expressions.info/unicode.html for details.
Why doesn't that expression match
tú
?
That expression doesn't match tú
because \b
doesn't seem to recognize ú
as a word character, and thus fails when used between non-word characters.
You could use something like this instead:
/(?<!\p{L})(t[úu]s*)(?!\p{L})/u
\p{L}
matches a unicode letter.