1

I've got a regex that was working perfectly fine until I switched my locale to 'fa' (Persian). I suspect this would happen with Hebrew and Arabic too (not yet sure if it's the characters or the RTL direction that makes it break).

The line of code causing the exception is:

public static final Pattern NAME_REGEX = Pattern.compile(String.format("^[\\w ]{%d,%d}$", 2,24));

(the syntax is fine, it works in English & Spanish) but when the app tries to compile the regex in the 'incompatible' locales, I get the following:

at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:605)
at dalvik.system.NativeStart.main(Native Method)
Caused by: java.util.regex.PatternSyntaxException: Syntax error U_REGEX_BAD_INTERVAL     near index 8:
^[\w ]{٢,٢٤}$
   ^
at java.util.regex.Pattern.compileImpl(Native Method)
at java.util.regex.Pattern.compile(Pattern.java:400)
at java.util.regex.Pattern.<init>(Pattern.java:383)
at java.util.regex.Pattern.compile(Pattern.java:374)
at com.airg.hookt.config.airGConstant.<clinit>(airGConstant.java:131)

Any help would be appreciated. Thanks

copolii
  • 14,208
  • 10
  • 51
  • 80
  • `Syntax error U_REGEX_BAD_INTERVAL near index 8: ^[\w ]{٢,٢٤}$`, those range `2,24` numbers do not look right to me. Make sure you are using proper encoding. – Qtax Jul 24 '11 at 23:44
  • could it be that the string.format is messing it up? – copolii Jul 25 '11 at 03:49

2 Answers2

1

Looks like you're trying to specify the interval using Arabic-Indic digits (U+0660..U+0669); I would have been very surprised if that had worked. I've never heard of a regex flavor that accepts anything but ASCII digits as part of the regex itself.

Are you also expecting \w to match letters/digits from Persian, Hebrew, and Arabic scripts? That won't work either, but this time it's because of a shortcoming in Java's regex flavor. If you want to match characters from any writing system, you need to use Unicode properties like \p{L} and \p{N} (but see here for more details).

Community
  • 1
  • 1
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • No that's what shows up in the Exception text, if you look closer to the beginning of my post, I've used latin digits. That's probably just because the locale is persian or Arabic. What I need is word characters and space between 2 and 24 characters in length. I don't know why it shows the persian digits in the Exception text. I've updated the question with the actual line of code that triggers this Exception. – copolii Jul 25 '11 at 03:46
  • Well, you couldn't have gotten an exception with Persian digits in the message unless the string you passed to `Pattern.compile()` had contained Persian digits. And now it's obvious that `String.format()` is converting them. Are you aware that there's a [format()](http://download.oracle.com/javase/6/docs/api/java/lang/String.html#format%28java.util.Locale,%20java.lang.String,%20java.lang.Object...%29) method that lets you specify the Locale? – Alan Moore Jul 25 '11 at 04:51
0

ANSWER

So ... the problem was indeed the String.format

Changing

public static final Pattern NAME_REGEX = Pattern.compile(String.format("^[\\w ]{%d,%d}$", 2,24));

to

public static final Pattern NAME_REGEX = Pattern.compile("^[\\w ]{" + 2 + "," + 24 + "}$");

fixed the crash. Thanks to everyone for their contribution.

copolii
  • 14,208
  • 10
  • 51
  • 80