\b
is English-centric, I'm afraid, and not actually that good at even being English-centric. :-) (For instance, it would match at the end of "English" in "English-centric".)
You can use lookarounds with a negated Unicode "letter" category to check for word boundaries. Those features exist in the most recent JavaScript spec, but support is very spotty. You can throw a library at it, though: XRegExp
by Steven Levithan:
var str ="آپ کا نام کیا ہے؟";
var rex = XRegExp("(?<=^|[^\\p{Letter}])آپ(?=$|[^\\p{Letter}])", "g");
var res = str.replace(rex, "aap");
console.log(res);
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.2.0/xregexp-all.min.js"></script>
In that regular expression:
(?<=^|[^\p{Letter}])
is a look-behind for start of input or a non-letter per the Unicode standard. (Note that the \
has to be escaped inside the string we pass XRegExp
so the XRegExp
receives it, since \
is an escape in string literals.)
آپ
is the word
(?=$|[^\p{Letter}])
is a look-ahead for the end of input or a non-letter. (Again, with the \
escaped in the string.)
As I mentioned in my comment, because of the right-to-left (RTL) vs. left-to-right (LTR) language script difference (e.g., Arabic script vs. Latin script), that shows up as aap کا نام کیا ہے؟
rather than your expected output, even though the text was replaced in the right place, because the Urdu word is at the beginning of the string (but when rendered, all of the Arabic script is output from right-to-left). So in the updated string, the Latin script (app
) is output left-to-right, followed by the Arabic script right-to-left.
In a really up-to-date JavaScript engine, you could do it natively:
var str ="آپ کا نام کیا ہے؟";
var rex = /(?<=^|[^\p{Letter}])آپ(?=$|[^\p{Letter}])/g;
var res = str.replace(rex, "aap");
console.log(res);
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.2.0/xregexp-all.min.js"></script>
That works in the version of V8 in Chrome v75 and Node.js v12.4, for instance.
(Side note: With XRegExp
, you could use the shorthand \pL
instead of \p{Letter}
, but not with JavaScript's own regular expressions.)