9

As far as I know, \d should matchs non-english digits, e.g. ۱۲۳۴۵۶۷۸۹۰ but it doesn't work properly in JavaScript.

See this jsFiddle: http://jsfiddle.net/xZpam/

Is this a normal behavior?

Shervin
  • 1,936
  • 17
  • 27
Afshin Mehrabani
  • 33,262
  • 29
  • 136
  • 201
  • 1
    It is normal. The question is whether you can enable the unicode behavior in javascript regexes. Chrome doesn't like the `u` flag. – John Dvorak May 21 '13 at 05:49
  • 1
    http://stackoverflow.com/questions/150033/regular-expression-to-match-non-english-characters – Ravi Gadag May 21 '13 at 05:52
  • Since it only matches [0-9] why not try something like `^[۱۲۳۴۵۶۷۸۹۰]+$`? – Menno May 21 '13 at 05:53

8 Answers8

11

It seems that JavaScript does not support this (along with other weaknesses of the language in RegExp). However there's a library called XRegExp that has a unicode addon, which enables unicode support through \p{} category definition. For example if you use \p{Nd} instead of \d it will match digits:

<script src="xregexp-all.js" type="text/javascript"></script>
<script type="text/javascript">
    var englishDigits = '123123';
    var nonEnglishDigits = '۱۲۳۱۲۳';

    var digitsPattern = XRegExp('\\p{Nd}+');
    if (digitsPattern.test(nonEnglishDigits)) {
        alert('Non-english using xregexp');
    }

    if (digitsPattern.test(englishDigits)) {
        alert('English using xregexp');
    }
</script>

EDIT:

Used \p{Nd} instead of \p{N} as it seems that \d is equivalent to \p{Nd} in non ECMA Script Regex engines. Thanks go to Shervin for pointing it out. See also this fiddle by Shervin.

Community
  • 1
  • 1
Sina Iravanian
  • 16,011
  • 4
  • 34
  • 45
  • And if you want to limit your input to [Arabic-Indic](http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Cp%7Bsubhead%3DArabic-Indic+digits%7D&g=) or [Eastern Arabic-Indic Digits](http://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{subhead=Eastern%20Arabic-Indic%20digits}) you can use Unicode block criteria: `alert(XRegExp("^\\p{InArabic}\\p{N}").test('۱۲۳۴٤۵٥۶۷۸۹۰')); // True` `alert(XRegExp("^\\p{InArabic}\\p{N}").test('1234567890')); // False` – Shervin May 21 '13 at 06:43
  • 2
    @Sina, I think that `\\p{N}` (Number) should be replaced with `\\p{Nd}` (Decimal Number) since we don't want to match [non-decimal numeral characters](http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5Cp%7BN%7D-%5Cp%7BNd%7D&g=) like ➋, ⅑, Ⅷ, etc.: http://jsfiddle.net/wZXZ3/2/ – Shervin May 21 '13 at 07:26
  • 1
    @Shervin Thanks, I updated the answer, and linked to your fiddle. – Sina Iravanian May 21 '13 at 11:40
10

JavaScript does not support Unicode regex matching (and it is far from the only language where such is true).

http://www.regular-expressions.info/unicode.html

Amber
  • 507,862
  • 82
  • 626
  • 550
3

In the documention of Mozilla Firefox (https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/RegExp) you will find that:

\d  

Matches a digit character in the basic Latin alphabet. Equivalent to [0-9].
kaljak
  • 1,263
  • 1
  • 16
  • 34
2

\d is equivalent to [0-9], according to MDN.

Arjan
  • 9,784
  • 1
  • 31
  • 41
1

From MDN . RegEx Test

Matches a digit character in the basic Latin alphabet. Equivalent to [0-9].

Ravi Gadag
  • 15,735
  • 5
  • 57
  • 83
1
Matches a digit character. Equivalent to [0-9].

For example, /\d/ or /[0-9]/ matches '2' in "B2 is the suite number."

From MDN

Dinever
  • 690
  • 5
  • 13
1

Yes, it is normal and correct that \d matches the Ascii digits 0 to 9 only. The authoritative reference is the ECMAScript standard. It is not particularly easy reading, but clause 15.10.2.12 (CharacterClassEscape) specifies that \d denotes “the ten-element set of characters containing the characters 0 through 9 inclusive”.

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
0

Yes, using \d not matching properly to none-English numbers in JavaScript, But like other weird parts of JavaScript, you can still check none-English numbers (like Persian numbers for example) in JavaScript, using something like the code below:

/[۰, ۹]/.test("۱۲۳۴۵۶۷۸۹۰"); //true
Alireza
  • 100,211
  • 27
  • 269
  • 172