0

What's the best way to match any letter from any language? That's what I'm doing:

  let str1 = "aabb";
  console.log( str1.replace( /\w/gu, "$&|") ); //→ a|a|b|b|
  let str2 = "aābḇ";
  console.log( str2.replace( /\w/gu, "$&|") ); //→ a|Äb|ḇ

I want to have something like sed and grep:

$ echo aābḇ | sed 's/\w/&|/g'
a|ā|b|ḇ|

$ echo "ʔ.ā^ḫ&" | grep -o '\w'
ʔ
ā
ḫ

The same question about unicode \b.

Eugene Barsky
  • 5,780
  • 3
  • 17
  • 40
  • Take a look at Technical Committee 39, [this page](https://github.com/tc39/proposal-regexp-unicode-property-escapes) specifically. – revo Aug 30 '18 at 12:58
  • @revo I'm a newbie, so I can't figure out if there is a simple solution (without using XRegExp). I've tried to use `/\p{L}/gu`, but it still doesn't work. – Eugene Barsky Aug 30 '18 at 13:05
  • This may not be what you are looking for: `console.log( str2.replace( /[^\s.,:;]/gu, "$&|") ); //→ a|ā|b|ḇ – enxaneta Aug 30 '18 at 13:10
  • 1
    @enxaneta Thanks! That's approximately what I'm doing now, but it's very inconvenient, since with real data the list of `\W` symbols is very large. – Eugene Barsky Aug 30 '18 at 13:12
  • I try again: console.log( str2.replace( /[a-z\u00C0-\u017F]/gu, "$&|") ); I'm using this list of Unicode characters https://www.utf8-chartable.de/unicode-utf8-table.pl – enxaneta Aug 30 '18 at 13:19
  • 1
    @enxaneta I doesn't work on either chromium or firefox with any non-English characters. – Eugene Barsky Aug 30 '18 at 13:44

0 Answers0