Javascript - non-word character match which is UTF-8 compatible

Asked Jun 04 '17 at 19:53

Active Jun 04 '17 at 19:53

Viewed 28 times

Reading this article, I was hopeful that ecmascript 6 finally has UTF-8 compatible matching. However, when attempting to match UTF-8 compatible non-word chars the following way, by adding the u flag to the regex:

var txt = "die küssen";
var arr = txt.split(/\W+/u);
dump(arr+"\n");

expected output:

die,küssen

actual output:

die,k,ssen

I tried word boundary as well:

var arr = txt.split(/\b/u);

outputs:

die, ,k,ü,ssen

Using the constructor syntax doesn't help either:

var regexp = new RegExp(/\W+/, 'u');
var arr = txt.split(regexp);

This needs to work on Firefox, and according to the browser comptability chart on this MDN page, the u flag should work.

Is there something more I need to do? Or am I not understanding the new spec?

I hope to not have to resort to something like this (or a library which will essentially do the same).

Thank you kindly...

asked Jun 04 '17 at 19:53

KevinHJ

1,014
11
24

1

With ES6, `\b` and shorthand character classes are still not Unicode aware. – Wiktor Stribiżew Jun 04 '17 at 19:58
Sorry, but you do not have another way. – Wiktor Stribiżew Jun 04 '17 at 20:58

Javascript - non-word character match which is UTF-8 compatible

0 Answers0