What is the equivalent of \b in regular expression without using lookbehind?

Question

I know that the regular expression

(?<!\w)(?=\w)|(?<=\w)(?!\w)

is the equivalent of

\b

but seems that in javascript (typescript in my case) ?> and ?< that are positive and negative lookbehind are not supported.

What is the equivalent of this expression without using lookbehind?

Just to explain the background of the problem. I have to separate words in a sentence. \b works perfectly but it doesn't care about accented characters so being

\b

equal to

(?<!\w)(?=\w)|(?<=\w)(?!\w)

and

\w

equal to

[A-Za-z0-9]

turning

\b

in

(?<![A-Za-z0-9À-ÿ])(?=[A-Za-z0-9À-ÿ])|(?<=[A-Za-z0-9À-ÿ])(?![A-Za-z0-9À-ÿ])

matches perfectly what supposed to. The regular expression is fine, but javascript doesn't support lookbehind so isn't possibile to use.

EDIT: It's not an aswer, but a workaround, so I leave it here just for people passing by to which will suit a 'dirty' solution. Let's say you have a sentence (a string) to break into its components and \b it is right to you, but doesn't work with diacritics (like accented letters). It is possible to solve the problem, removing diacritics from the string with the function provided here, then using \b as usual, and then rebuilding your breaked string including diacritis, parsing the original string and returning another string array builded using the indexes of words you got from breaking the removed-diacritics string. Here it is the implementation in typescript

splitSentenceInWords(sentenceToSplit){
    var splitInWordsNoDiacritis: string[];
    var splitInWordsWithDiacritics: string[];
    var i:number;
    var counterBegin: number;
    var counterEnd: number;

    splitInWordsNoDiacritis=[];
    splitInWordsWithDiacritics=[];

    splitInWordsNoDiacritis=this.removeDiacritics(sentenceToSplit).split(/\b/g);

    counterBegin=0;
    counterEnd=0;

    for (i = 0; i < splitInWordsNoDiacritis.length; i++) {
      counterEnd=splitInWordsNoDiacritis[i].length+counterBegin;
      splitInWordsWithDiacritics[i]=sentenceToSplit.substring(counterBegin, counterEnd);
      counterBegin=counterEnd;
    }
    return splitInWordsWithDiacritics;
  }

There is no other equivalent. `\b` is a zero-character match. You need a look-behind for this. — Sebastian Simon, Nov 23 '17 at 13:43
take a look at [my answer here](https://stackoverflow.com/a/45985701/7393478), it is written for javascript and could solve your problem. Unfortunately the solution seems not to be unique for all situations, if i remember well, there is a difference when the boundary is accented or not, you have to try it — Kaddath, Nov 23 '17 at 13:51

What is the equivalent of \b in regular expression without using lookbehind?

0 Answers0