Why this regex does not work with Eastern Arabic numerals?

Question

@thg435 wrote this answer to a javascript question:

> a = "foo 1234567890 bbb 123456"
"foo 1234567890 bbb 123456"
> a.replace(/\d(?=\d\d(\d{3})*\b)/g, "[$&]")
"foo 1[2]34[5]67[8]90 bbb [1]23[4]56"

It works well with Hindu-Arabic numerals; i.e. 1,2,3,4,... . But when I try to apply the regex to Eastern Arabic numerals, it fails. Here is the regex I use (I've just replaced \d with [\u0660-\u0669] ):

/[\u0660-\u0669](?=[\u0660-\u0669][\u0660-\u0669]([\u0660-\u0669]{3})*\b)/g

It actually works if my string is ١٢٣٤foo, but fails when it's ١٢٣٤ foo or even foo١٢٣٤:

> a = "١٢٣٤foo  ١٢٣٤ foo  foo١٢٣٤"
"١٢٣٤foo  ١٢٣٤ foo  foo١٢٣٤"
> a.replace(/[\u0660-\u0669](?=[\u0660-\u0669][\u0660-\u0669]([\u0660-\u0669]{3})*\b)/g, "[$&]")
"١[٢]٣٤foo  ١٢٣٤ foo  foo١٢٣٤"

What actually matters to me are separated numbers (e.g. ١٢٣٤). Why it cannot match separated numbers?

Update:

Another requirement is that the regex should only match numbers with 5 or more digits (e.g. ١٢٣٤٥ and not ١٢٣٤). I initially thought that that's as simple as adding {5,} at the end of the expression, but that doesn't work.

[this](http://stackoverflow.com/questions/12518689/regular-expression-not-to-allow-numbers-just-arabic-letters) might help — a better oliver, Apr 26 '13 at 17:18

JLRishe · Answer 1 · 2013-04-26T20:06:20.653

1

Oddly, I'm experiencing the opposite behavior from you (the first one doesn't work and the other two do), but how about if you replaced the \b with (?![\u0660-\u0669])? Then it seems to work no matter what's before or after it:

[\u0660-\u0669](?=[\u0660-\u0669][\u0660-\u0669]([\u0660-\u0669]{3})*(?![\u0660-\u0669]))

Edit: This seems to work for the new requirement - to only add the brackets if the run of digits is 3 digits long or more:

[\u0660-\u0669](?=[\u0660-\u0669]{2}([\u0660-\u0669]{3})+(?![\u0660-\u0669]))|(?<=[\u0660-\u0669]{2})[\u0660-\u0669](?=[\u0660-\u0669]{2}(?![\u0660-\u0669]))

Incidentally, some Regex processors will treat those digits as a match for \d. Here is that second Regex with \d instead of those character ranges, which should be a little easier to read:

\d(?=\d{2}(\d{3})+(?!\d))|(?<=\d{2})\d(?=\d{2}(?!\d))

edited Apr 26 '13 at 20:06

answered Apr 26 '13 at 17:20

JLRishe

99,490
19
131
169

works well with nearly all regex engines except javascript's.. this is a problem with javascript's regex..also i have doubt about nested lookahead's support in javascript – Anirudha Apr 26 '13 at 17:34
This solved my problem. Only one simple more question: How can I only match 5 or more digits numbers (e.g. 12345 and not 1234)? Where should I add {5,}? – Iryn Apr 26 '13 at 19:05
The new regex doesn't work. Can you please check the code, or create a jsfiddle? – Iryn Apr 26 '13 at 20:27

Why this regex does not work with Eastern Arabic numerals?

1 Answers1

Linked