16

In Java i'm looking for a regular expression that accepts any Persian( or Arabic ) letters except any Persian ( or Arabic) numbers. In order to have only letters i found a very good regular expression:

[\u0600-\u065F\u066A-\u06EF\u06FA-\u06FF]

although it is true and works for me, But we know that we can use the \\p{L}+ as a regular expression which accepts all letters from all languages in the world, and in my case ( Arabic - Persian ) i can modified it and use [\\p{InArabic}]+$.

But by using [\\p{InArabic}]+$ not only all Arabic(Persian) letters are going to be accepted but also Arabic numbers are acceptable too, like ۱ ۲.

So my question is how can i modify [\\p{InArabic}]+$ to just accept letters not numbers, or in other word how can i restrict [\\p{InArabic}]+$ to not accept any numbers?

Please Notice that the Persian(Arabic) numbers are like these: ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹ ۰

Elyas Hadizadeh
  • 3,289
  • 8
  • 40
  • 54

2 Answers2

22

You can use the following regex:

"[\\p{InArabic}&&\\PN]"

\p{InArabic} matches any character in Unicode Block Arabic (from U+0600 to U+06FF)

\PN matches any character not belonging to any of the Number category (note the capital P).

Intersecting the 2 sets give the desired result: both digit ranges (U+0660 to U+0669) and (U+06F0 to U+06F9) are excluded.

Testing code

for (int i = 0x600; i <= 0x6ff; i++) {
    String c = "" + (char) i;
    System.out.println(Integer.toString(i, 16) + " " + c.matches("[\\p{InArabic}&&\\PN]"));
}
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • are you sure it should be like this? `[\\p{InArabic}&&\\PN] ` because my IDE shows me a red line behind of \\PN :( It says : character category expected. – Elyas Hadizadeh May 08 '15 at 05:51
  • @ElyasHadizadeh: I have tested the regex on my machine (Java 7 and Java 8) before posting. Note that I'm specifying the regex inside string literal. Remove one ``\`` if you need the raw form. – nhahtdh May 08 '15 at 05:52
  • yes you are true and it works as a charm, but it's very strange that my IDE draws a red line behind of \\PN. ( I am using IntelliJ IDEA 13.0.4, with Java 7 and Java 8, in both cases it shows red line behind of \\PN, but when i compile and run the application it works correctly. – Elyas Hadizadeh May 08 '15 at 06:07
  • Thank you for you answer, What is your IDE? do you know why do i get such a bug? – Elyas Hadizadeh May 08 '15 at 06:07
  • @ElyasHadizadeh: I use Eclipse, but Eclipse doesn't do any validity detection on regex. The most it does is check syntax of string literal. – nhahtdh May 08 '15 at 06:13
  • aha ! that's the reason, because the IntelliJ do validity detection on regex, and it seems that `"[\\p{InArabic}&&\\PN]"` is very strange for it. but you know it just shows a red line under it. in the compiling and running it doesn't have any problem. any way thank you buddy – Elyas Hadizadeh May 08 '15 at 06:18
7

You can use character class subtraction, which is a rather obscure feature:

[\p{InArabic}&&[^۰-۹]]

Working example: http://ideone.com/jChGem

Community
  • 1
  • 1
Kobi
  • 135,331
  • 41
  • 252
  • 292