How can I remove all non-alpha-numeric Arabic characters from a string in Java?
-
1Probably using a [regular expression](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html). – Andy Turner Sep 13 '16 at 13:33
-
1Do you mean things like Arabic full stop, Arabic comma etc.? What about non-alpha-numeric characters which are not Arabic? What about Arabic characters which are alpha numeric? – RealSkeptic Sep 13 '16 at 13:43
4 Answers
use regex [^A-Za-z0-9 ]
the regex will only allow alphabets from A to Z and a to z also numericals from 0 to 9. nothing else

- 1,254
- 12
- 19
-
2
-
1thanks ! it works... I just used "[^A-Za-zأ-ْ-9 ]" (to support both Arabic and English non-alpha-numeric – fattah.safa Sep 13 '16 at 14:05
-
There is a problem with this pattern. It also removes Alif (أ) and Hamza (ء) from the arabic string which is wrong. How to avoid that ? – Maulik Kayastha Jun 20 '21 at 08:43
Here is the complete answer:
String patternString = "";
Pattern pattern = null;
Matcher matcher = null;
String normalizedString = "";
patternString = "[^A-Za-zأ-ْ-9 ]";
pattern = Pattern.compile(patternString);
matcher = pattern.matcher(string);
normalizedString = matcher.replaceAll("");

- 926
- 2
- 14
- 36
I tried multiple solutions and nothing works prominently. I tried all the solution from the current thread as well as from here - how could i remove arabic punctuation form a String in java.
As no other solution works completely, I have created method which will retain only arabic characters and rest all chars will be removed as below -
public static String findArabicString(String s) {
StringBuilder finalValue = new StringBuilder();
if (null != s) {
for (int i = 0; i < s.length();) {
int c = s.codePointAt(i);
if ((c >= 0x0600 && c <= 0x06E0))
finalValue.append((char) c);
i += Character.charCount(c);
}
}
System.out.println(finalValue.toString());
return finalValue.toString();
}
The method can be customized as required, for example I want to retain space and arabic characters, then there is a slight chnage required in the testing condition as below -
public static String findArabicString(String s) {
StringBuilder finalValue = new StringBuilder();
if (null != s) {
for (int i = 0; i < s.length();) {
int c = s.codePointAt(i);
// 32 is unicode for white space
if ((c >= 0x0600 && c <= 0x06E0) || c == 32)
finalValue.append((char) c);
i += Character.charCount(c);
}
}
System.out.println(finalValue.toString());
return finalValue.toString();
}
I hope this will help to anyone facing similar issue as I do.

- 177
- 2
- 10
To remove arabic alpha from a string you can use the method below :
public void removeArabicChars() {
String input = "This string contains Arabic characters هذا النص يحتوي على حروف عربية";
String output = input.replaceAll("\\p{InArabic}", "");
System.out.println(output);
}

- 43
- 7