Reading this article, I was hopeful that ecmascript 6 finally has UTF-8 compatible matching. However, when attempting to match UTF-8 compatible non-word chars the following way, by adding the u
flag to the regex:
var txt = "die küssen";
var arr = txt.split(/\W+/u);
dump(arr+"\n");
expected output:
die,küssen
actual output:
die,k,ssen
I tried word boundary as well:
var arr = txt.split(/\b/u);
outputs:
die, ,k,ü,ssen
Using the constructor syntax doesn't help either:
var regexp = new RegExp(/\W+/, 'u');
var arr = txt.split(regexp);
This needs to work on Firefox, and according to the browser comptability chart on this MDN page, the u
flag should work.
Is there something more I need to do? Or am I not understanding the new spec?
I hope to not have to resort to something like this (or a library which will essentially do the same).
Thank you kindly...