2

How can I remove all non-alpha-numeric Arabic characters from a string in Java?

fattah.safa
  • 926
  • 2
  • 14
  • 36
  • 1
    Probably using a [regular expression](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html). – Andy Turner Sep 13 '16 at 13:33
  • 1
    Do you mean things like Arabic full stop, Arabic comma etc.? What about non-alpha-numeric characters which are not Arabic? What about Arabic characters which are alpha numeric? – RealSkeptic Sep 13 '16 at 13:43

4 Answers4

1

use regex [^A-Za-z0-9 ] the regex will only allow alphabets from A to Z and a to z also numericals from 0 to 9. nothing else

Abbas Kararawala
  • 1,254
  • 12
  • 19
1

Here is the complete answer:

   String patternString = "";
    Pattern pattern = null;
    Matcher matcher = null;
    String normalizedString = "";

    patternString = "[^A-Za-zأ-ْ-9 ]";
    pattern = Pattern.compile(patternString);
    matcher = pattern.matcher(string);
    normalizedString = matcher.replaceAll("");
fattah.safa
  • 926
  • 2
  • 14
  • 36
0

I tried multiple solutions and nothing works prominently. I tried all the solution from the current thread as well as from here - how could i remove arabic punctuation form a String in java.

As no other solution works completely, I have created method which will retain only arabic characters and rest all chars will be removed as below -

public static String findArabicString(String s) {
    StringBuilder finalValue = new StringBuilder();
    
     if (null != s) {
        for (int i = 0; i < s.length();) {
            int c = s.codePointAt(i);
            if ((c >= 0x0600 && c <= 0x06E0))
                finalValue.append((char) c);
            i += Character.charCount(c);            
        }
     }
    
    System.out.println(finalValue.toString());
    return finalValue.toString();
}

The method can be customized as required, for example I want to retain space and arabic characters, then there is a slight chnage required in the testing condition as below -

public static String findArabicString(String s) {
    StringBuilder finalValue = new StringBuilder();
    
     if (null != s) {
        for (int i = 0; i < s.length();) {
            int c = s.codePointAt(i);
            // 32 is unicode for white space
            if ((c >= 0x0600 && c <= 0x06E0) || c == 32)
                finalValue.append((char) c);
            i += Character.charCount(c);            
        }
     }
    
    System.out.println(finalValue.toString());
    return finalValue.toString();
}

I hope this will help to anyone facing similar issue as I do.

Maulik Kayastha
  • 177
  • 2
  • 10
0

To remove arabic alpha from a string you can use the method below :

 public void removeArabicChars() {
        String input = "This string contains Arabic characters هذا النص يحتوي على حروف عربية";
        String output = input.replaceAll("\\p{InArabic}", "");
        System.out.println(output);
    }