I've been trying swap the word-boundary \b
with a suitable rule for locale regex. So far I've found that I can't come up with a good solution even after trying various options and searching the web.
A little context... I'm using markjs.io to mark specific words using regex. The problem I am having, is that I can't use word-boundary, because it doesn't work with locale characters (such as čČšŠžŽ etc). And The best regex I have come up with falls short. What I have been using so far is something in the lines of... let rgx = /(?:^|\s)žolč iz žrela([\s.!?,:;])/i;
.
Basically what this does is check if it's the beginning of a new line or a space character, then the string I want to test for, and at the end there should be a space, or some other separator (by the way, is there a token which includes all separators - something like \s ?)
The problem with the above regex is, that it also "marks"(markjs.io) the leading space and the trailing space or separator.
What I am trying to do (mostly), is to use non-capture-groups (so they don't get marked) and check if the value matches (space/beginning of the line/end of line) or doesn't match (non-word-characters).
TL-DR: Because of locale characters I can't use word-boundary. Can anyone recommend a good way to swap \b
with something else, which doesn't "mark" the leading and trailing character.
Some of the things I have tried:
let text = {}
text.one = "Žolč iz žrela.";
text.two = "Leading žolč iz žrela.";
text.three = "Žolč iz žrela trailing.";
text.four = "Leading žolč iz žrela trailing.";
text.five = "Leadingžolč iz žrelatrailing.";
let rgx = {}
rgx.l_one = /(?:^|\s)žolč iz žrela/i; //also marks the leading space
rgx.l_two = /(?=^|\s)žolč iz žrela/i; //doesn't work at all
rgx.l_three = /((?!\W)(?=^|\s))žolč iz žrela/i; //doesn't work at all
rgx.l_four = /(?:(?=^|\s))žolč iz žrela/i; //doesn't work at all
rgx.t_one = /žolč iz žrela([\s.!?,:;])/i; //also marks the separator / space
rgx.t_two = /žolč iz žrela(?:(?=\s|$))/i; //doesn't work with dot at the end
rgx.t_three = /žolč iz žrela(?:(?!\W))/i; //doesn't work at all
If I can elaborate or improve the question, let me know. Thank you.
I've got to say I disagree with the duplicate tag, as the answer provided does not touch on the leading space and trailing space problem.