As far as I know, \d
should matchs non-english digits, e.g. ۱۲۳۴۵۶۷۸۹۰
but it doesn't work properly in JavaScript.
See this jsFiddle: http://jsfiddle.net/xZpam/
Is this a normal behavior?
As far as I know, \d
should matchs non-english digits, e.g. ۱۲۳۴۵۶۷۸۹۰
but it doesn't work properly in JavaScript.
See this jsFiddle: http://jsfiddle.net/xZpam/
Is this a normal behavior?
It seems that JavaScript does not support this (along with other weaknesses of the language in RegExp). However there's a library called XRegExp that has a unicode addon, which enables unicode support through \p{}
category definition. For example if you use \p{Nd}
instead of \d
it will match digits:
<script src="xregexp-all.js" type="text/javascript"></script>
<script type="text/javascript">
var englishDigits = '123123';
var nonEnglishDigits = '۱۲۳۱۲۳';
var digitsPattern = XRegExp('\\p{Nd}+');
if (digitsPattern.test(nonEnglishDigits)) {
alert('Non-english using xregexp');
}
if (digitsPattern.test(englishDigits)) {
alert('English using xregexp');
}
</script>
Used \p{Nd}
instead of \p{N}
as it seems that \d
is equivalent to \p{Nd}
in non ECMA Script Regex engines. Thanks go to Shervin for pointing it out. See also this fiddle by Shervin.
JavaScript does not support Unicode regex matching (and it is far from the only language where such is true).
In the documention of Mozilla Firefox (https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/RegExp) you will find that:
\d
Matches a digit character in the basic Latin alphabet. Equivalent to [0-9].
From MDN . RegEx Test
Matches a digit character in the basic Latin alphabet. Equivalent to [0-9].
Yes, it is normal and correct that \d
matches the Ascii digits 0
to 9
only. The authoritative reference is the ECMAScript standard. It is not particularly easy reading, but clause 15.10.2.12 (CharacterClassEscape) specifies that \d
denotes “the ten-element set of characters containing the characters 0 through 9 inclusive”.
Yes, using \d
not matching properly to none-English numbers in JavaScript, But like other weird parts of JavaScript, you can still check none-English numbers (like Persian numbers for example) in JavaScript, using something like the code below:
/[۰, ۹]/.test("۱۲۳۴۵۶۷۸۹۰"); //true