According to this post, if the position is not at the beginning or end of the string, only a word character ([0-9A-Za-z_])
defines the word boundary.
However, the following code returns something I didn't expect (example derived from this book in the section Dynamically creating RegExp objects)
let name = "dea+hl[]rd";
let text = "dea+hl[]rd is a suspicious character.";
let regexp = new RegExp("\\b(" + name + ")\\b", "gi");
console.log(text.replace(regexp, "_$1_"));
// → dea+hl[]rd is a suspicious character.
Shouldn't the first matched group be dea
because +
is not a word character? I expect the replaced string to be _dea_+hl[]rd is a suspicious character
.
In addition, when I replace it with let name = "";
, the output becomes __dea__+__hl__[]__rd__ __is__ __a__ __suspicious__ __character__.
Where do the underscores come from?
The corrected code shown in the book is cryptic to me as well
let name = "dea+hl[]rd";
let text = "This dea+hl[]rd guy is super annoying.";
let escaped = name.replace(/[\\[.+*?(){|^$]/g, "\\$&");
let regexp = new RegExp("\\b" + escaped + "\\b", "gi");
console.log(text.replace(regexp, "_$&_"));
// → This _dea+hl[]rd_ guy is super annoying.
How does adding backlashes before the special character affect the word boundary?
To answer my own question (because my question is closed by some irrelevant duplicate), I test my regex on regex101. I didn't try it because RegExp
is not accepted syntax there. Anyway, the reason there is essentially no matching string in the first example is that /\b(dea+hl[]rd)\b/
is not a valid regex. []
is special character denoting a set of characters. There is no way to find some matching string when the regex cannot be evaluated. So text.replace(regexp, "_$1_")
just returns text
.
When name = ""
, underscores come from the fact that $1$
always matches word boundaries. If we mark word boundaries by |
, the word boundaries in text
are |dea|+|hl|[]|rd| |is| |a| |suspicious| |character|.
Finally, escaping special characters does not change the behavior of word boundary. It is just there to make the regex valid for the reasons mentioned above.