5

I want to know is text contain any letter in Urdu or Arabic..using this condition which produce false results when special characters comes.what is right way to do it .any library or what is right regex for this ?

   if (cap.replaceAll("\\s+", "").matches("[A-Za-z]+")
                    || cap.replaceAll("\\s+", "").matches("[A-Za-z0-9]+")) {
                Log.d("isUrdu", "false");
                caption.setTypeface(Typeface.DEFAULT);
                caption.setTextSize(16);

            } else {
                Log.d("isUrdu", "True");
             /*   if (Build.VERSION.SDK_INT > Build.VERSION_CODES.JELLY_BEAN_MR1) {*/
                    caption.setTypeface(typeface);
                    caption.setTextSize(20);

         /*       }*/
            }
Edward Falk
  • 9,991
  • 11
  • 77
  • 112
Usman Saeed
  • 843
  • 3
  • 9
  • 21
  • I think you have to convert character to UTF then compare it with Urdu and Arabic character code. – Shahzain ali Oct 03 '16 at 11:02
  • 1
    Try `if (cap.matches("(?s).*[\\p{Arabic}\\u0600-\\u06FF\\u0750-\\u077F\\uFB50-\\uFDFF\\uFE70-\\uFEFF].*")) { /*YES, it is either Arabic or Urdu*/ }`. To only check for Urdu, use `"(?s).*[\\u0600-\\u06FF\\u0750-\\u077F\\uFB50-\\uFDFF\\uFE70-\\uFEFF].*"` – Wiktor Stribiżew Oct 03 '16 at 11:22

3 Answers3

4

Taking a look at the Wikipedia Urdu alphabet, it includes the following Unicode ranges:

U+0600 to U+06FF
U+0750 to U+077F
U+FB50 to U+FDFF
U+FE70 to U+FEFF

To match an Arabic letter, you may use a \p{InArabic} Unicode property class.

So, you may use

if (cap.matches("(?s).*[\\u0600-\\u06FF\\u0750-\\u077F\\uFB50-\\uFDFF\\uFE70‌​-\\uFEFF].*"))
{
    /*There is an Urdu character*/
} 
else if (cap.matches("(?s).*\\p{InArabic}.*"))
{  
    /* The string contains an Arabic character */ 
}
else { /*No Arabic nor Urdu chars detected */ }

Note that (?s) enables the DOTALL modifier so that . could match linebreak symbols, too.

For better performance with matches, you may use reverse classes instead of the first .*: "(?s)[^\\u0600-\\u06FF\\u0750-\\u077F\\uFB50-\\uFDFF\\uFE70‌​-\\uFEFF]*[\\u0600-\\u06FF\\u0750-\\u077F\\uFB50-\\uFDFF\\uFE70‌​-\\uFEFF].*" and "(?s)\\P{InArabic}*\\p{InArabic}.*" respectively.

Note you may also use shorter "[\\u0600-\\u06FF\\u0750-\\u077F\\uFB50-\\uFDFF\\uFE70‌​-\\uFEFF]" and "\\p{InArabic}" patterns with Matcher#find().

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    This is perfect solution for my problem thank you very much – Usman Saeed Oct 03 '16 at 11:48
  • on cap.matches("(?s).*\\p{Arabic}.*") andriod stuido saying its unknown class – Usman Saeed Oct 03 '16 at 12:01
  • 1
    That means you may use either `\\p{IsArabic}` or `\\p{InArabic}`, please check and let know. Android uses ICU regex library, and it is a bit different from the Java `java.util.regex`. – Wiktor Stribiżew Oct 03 '16 at 12:02
  • Does it also work if text comes from a text or xml file or a database ? – MindRoasterMir Nov 01 '19 at 23:04
  • @MindRoasterMir Regex is used to search for matches in strings, so it is unaware of the source, it just expects a plain text. BTW, use more efficient `"(?s)[^\\u0600-\\u06FF\\u0750-\\u077F\\uFB50-\\uFDFF\\uFE70‌​-\\uFEFF]*[\\u0600-\\u06FF\\u0750-\\u077F\\uFB50-\\uFDFF\\uFE70‌​-\\uFEFF].*"` and `"(?s)\\P{InArabic}*\\p{InArabic}.*"`. – Wiktor Stribiżew Nov 01 '19 at 23:06
1

You can do without Regex here, all you need is to find what Arabic and Urdu UTF Character range is and then check if the entered text matches the range.

Gherbi Hicham
  • 2,416
  • 4
  • 26
  • 41
  • I think your answer a misleading and based on false information. Urdu and Arabic alphabet dont have same unicode values. thanks – MindRoasterMir Nov 01 '19 at 23:05
  • @MindRoasterMir that wasn't the gist of my answer though, it's actually irrelevant to the question which language he will use, he needs to find the UTF range for the languages he is checking for which can be found on the net, modified my answer. – Gherbi Hicham May 27 '20 at 09:43
0

try this here you can get whether the text is Arabic or not

var arregex = /[\u0600-\u06FF]/; var test = arregex.test(text); return test;

Ramji
  • 1
  • 1