I've been trying come up with a regex that will replace a word that may or may not contain accent characters. I've been researching this for the past couple days, but cannot find the information I need to solve my problem.
I had come up with a simple regex that handles words without accent characters great:
var re = new RegExp('(?:\\b)hello(?:\\b)', 'gm');
var string = 'hello hello hello world hellos hello';
string.replace(re, "FOO");
Result: FOO FOO FOO world hellos FOO
The above works as I want. The problem with the above code, is when the word contains an accent character as the first, or last character in the string. Example:
var re = new RegExp('(?:\\b)helló(?:\\b)', 'gm');
var string = 'helló helló helló world hellós helló';
string.replace(re, "FOO");
Result: helló helló helló world FOOs helló
Desired result: FOO FOO FOO world hellós FOO
From my understanding, the above is occurring because an accented character is interpreted as a boundary. My attempt at solving the problem (note: the range [A-zÀ-ÿ]
is what I consider the valid alphabet to construct a word):
var re = new RegExp('([^A-zÀ-ÿ]|^)helló([^A-zÀ-ÿ]|$)', 'gm');
var string = 'helló helló helló world hellós helló';
string.replace(re, "$1FOO$2");
Result: FOO helló FOO world hellós FOO
As you can see, I'm much closer to the desired result. However, the problem occurs when the word in question appears three or more times in a row. Please note the second occurrence of helló
was ignored. I believe that's because the whitespace preceding it was already matched by the first occurence of helló
.
Does anybody have any suggestions on how to achieve FOO FOO FOO world hellós FOO
?