0

Reading this article, I was hopeful that ecmascript 6 finally has UTF-8 compatible matching. However, when attempting to match UTF-8 compatible non-word chars the following way, by adding the u flag to the regex:

var txt = "die küssen";
var arr = txt.split(/\W+/u);
dump(arr+"\n");

expected output:

die,küssen

actual output:

die,k,ssen

I tried word boundary as well:

var arr = txt.split(/\b/u);

outputs:

die, ,k,ü,ssen

Using the constructor syntax doesn't help either:

var regexp = new RegExp(/\W+/, 'u');
var arr = txt.split(regexp);

This needs to work on Firefox, and according to the browser comptability chart on this MDN page, the u flag should work.

Is there something more I need to do? Or am I not understanding the new spec?

I hope to not have to resort to something like this (or a library which will essentially do the same).

Thank you kindly...

KevinHJ
  • 1,014
  • 11
  • 24

0 Answers0