0

I've been trying swap the word-boundary \b with a suitable rule for locale regex. So far I've found that I can't come up with a good solution even after trying various options and searching the web.

A little context... I'm using markjs.io to mark specific words using regex. The problem I am having, is that I can't use word-boundary, because it doesn't work with locale characters (such as čČšŠžŽ etc). And The best regex I have come up with falls short. What I have been using so far is something in the lines of... let rgx = /(?:^|\s)žolč iz žrela([\s.!?,:;])/i;.

Basically what this does is check if it's the beginning of a new line or a space character, then the string I want to test for, and at the end there should be a space, or some other separator (by the way, is there a token which includes all separators - something like \s ?)

The problem with the above regex is, that it also "marks"(markjs.io) the leading space and the trailing space or separator.

What I am trying to do (mostly), is to use non-capture-groups (so they don't get marked) and check if the value matches (space/beginning of the line/end of line) or doesn't match (non-word-characters).

TL-DR: Because of locale characters I can't use word-boundary. Can anyone recommend a good way to swap \b with something else, which doesn't "mark" the leading and trailing character.

Fiddle

Some of the things I have tried:

    let text = {}
    text.one = "Žolč iz žrela.";
    text.two = "Leading žolč iz žrela.";
    text.three = "Žolč iz žrela trailing.";
    text.four = "Leading žolč iz žrela trailing.";
    text.five = "Leadingžolč iz žrelatrailing.";

    let rgx = {} 
    rgx.l_one = /(?:^|\s)žolč iz žrela/i; //also marks the leading space
    rgx.l_two = /(?=^|\s)žolč iz žrela/i; //doesn't work at all
    rgx.l_three = /((?!\W)(?=^|\s))žolč iz žrela/i; //doesn't work at all
    rgx.l_four = /(?:(?=^|\s))žolč iz žrela/i; //doesn't work at all

    rgx.t_one = /žolč iz žrela([\s.!?,:;])/i; //also marks the separator / space
    rgx.t_two = /žolč iz žrela(?:(?=\s|$))/i; //doesn't work with dot at the end
    rgx.t_three = /žolč iz žrela(?:(?!\W))/i; //doesn't work at all

If I can elaborate or improve the question, let me know. Thank you.

I've got to say I disagree with the duplicate tag, as the answer provided does not touch on the leading space and trailing space problem.

WeAreDoomed
  • 248
  • 1
  • 14
  • See [this SO answer](https://stackoverflow.com/a/11705398/546871) plus the question and its other answers. – AdrianHHH Jan 24 '23 at 21:59

0 Answers0