Yesterday I made a question about Detect non valid XML characters in java, and this expression works as expected:
String xml10pattern = "[^"
+ "\u0009\r\n" // #x9 | #xA | #xD
+ "\u0020-\uD7FF" // [#x20-#xD7FF]
+ "\uE000-\uFFFD" // [#xE000-#xFFFD]
+ "\ud800\udc00-\udbff\udfff" // [#x10000-#x10FFFF]
+ "]";
However, I realized it would be better checking for invalid characters on client side using javascript, but I didn't succeed.
I almost achieved, except for range U+10000–U+10FFFF: http://jsfiddle.net/mymxyjaf/15/
For last range, I tried
var rg = /[^\u0009\r\n\u0020-\uD7FF\uE000-\uFFFD\ud800\udc00-\udbff\udfff]/g;
but it doesn't work. In regextester, tells "Range values reversed". I think it is because \ud800\udc00-\udbff\udfff
is intepreted as 3 expressions:
\ud800; \udc00-\udbff; \udfff
and, of course, the middle one fails.
So, my question is how convert above java regular expression into javascript.
Thanks.
==== UPDATE ====
Thanks to @collapsar comments, I tried to make two regular expressions.
Because of that, I realized I can't negate characters [^...]
.
It'll discard correct characters like U+10001
. I mean, this is not right:
function validateIllegalChars(str) {
var re1 = /[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD]/g;
var re2 = /[^[\uD800-\uDBFF][\uDC00-\uDFFF]]/g;
var str2 = str.replace(re1, '').replace(re2, ''); // First replace would remove all valid characters [#x10000-#x10FFFF]
alert('str2:' + str2);
if (str2 != str) return false;
return true;
}
Then, I tried next (http://jsfiddle.net/mymxyjaf/18/):
function valPos(str) {
var re1 = /[\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD]/g;
var re2 = /[\uD800-\uDBFF][\uDC00-\uDFFF]/g;
var str2 = str.replace(re1, '').replace(re2, '');
if (str2.length === 0) return true;
alert('str2:' + str2 + '; length: ' + str2.length);
return false;
}
However, when I call this function: valPos('eo' + String.fromCharCode(65537))
, where 65537 is U+10001
it returns false
.
What is wrong or how can I solve it?