I know that the regular expression
(?<!\w)(?=\w)|(?<=\w)(?!\w)
is the equivalent of
\b
but seems that in javascript (typescript in my case) ?> and ?< that are positive and negative lookbehind are not supported.
What is the equivalent of this expression without using lookbehind?
Just to explain the background of the problem. I have to separate words in a sentence. \b works perfectly but it doesn't care about accented characters so being
\b
equal to
(?<!\w)(?=\w)|(?<=\w)(?!\w)
and
\w
equal to
[A-Za-z0-9]
turning
\b
in
(?<![A-Za-z0-9À-ÿ])(?=[A-Za-z0-9À-ÿ])|(?<=[A-Za-z0-9À-ÿ])(?![A-Za-z0-9À-ÿ])
matches perfectly what supposed to. The regular expression is fine, but javascript doesn't support lookbehind so isn't possibile to use.
EDIT: It's not an aswer, but a workaround, so I leave it here just for people passing by to which will suit a 'dirty' solution. Let's say you have a sentence (a string) to break into its components and \b it is right to you, but doesn't work with diacritics (like accented letters). It is possible to solve the problem, removing diacritics from the string with the function provided here, then using \b as usual, and then rebuilding your breaked string including diacritis, parsing the original string and returning another string array builded using the indexes of words you got from breaking the removed-diacritics string. Here it is the implementation in typescript
splitSentenceInWords(sentenceToSplit){
var splitInWordsNoDiacritis: string[];
var splitInWordsWithDiacritics: string[];
var i:number;
var counterBegin: number;
var counterEnd: number;
splitInWordsNoDiacritis=[];
splitInWordsWithDiacritics=[];
splitInWordsNoDiacritis=this.removeDiacritics(sentenceToSplit).split(/\b/g);
counterBegin=0;
counterEnd=0;
for (i = 0; i < splitInWordsNoDiacritis.length; i++) {
counterEnd=splitInWordsNoDiacritis[i].length+counterBegin;
splitInWordsWithDiacritics[i]=sentenceToSplit.substring(counterBegin, counterEnd);
counterBegin=counterEnd;
}
return splitInWordsWithDiacritics;
}