1

i have a mixed text arabic , english , numbers & special charcters. How can i extract the arabic text only in java ?

Example :

مرحبا كيفك i'm fine and you كله تمام . كم عمرك . my age is 18

needed output :

مرحبا كيفك كله تمام كم عمرك 
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Hebeso
  • 21
  • 3
  • Use regex to remove inwanted characters from string – Ivan Pronin Jul 18 '17 at 20:35
  • 1
    Possible duplicate of [Extract Arabic phrases from a given text in java](https://stackoverflow.com/questions/23710720/extract-arabic-phrases-from-a-given-text-in-java) – Ousmane D. Jul 18 '17 at 20:38
  • OR --> [Extracting Arabic words from a string](https://stackoverflow.com/questions/31852871/extracting-arabic-wordsnot-semantic-arabic-phrases-from-a-string) – Ousmane D. Jul 18 '17 at 20:41

3 Answers3

3

The regular expression \p{InArabic} matches any Arabic letter. The regular expression \s matches any whitespace character. So if you only wish to see Arabic letters and spaces, you could use something like

myString.replaceAll("[^\\p{InArabic}\\s]", "");

to remove everything other than Arabic letters and whitespace.

Dawood ibn Kareem
  • 77,785
  • 15
  • 98
  • 110
0

write a regex statement that only accepts arabic characters. This one should get the job done : ^[\u0621-\u064A0-9 ]+$

it accepts all the arabic unicode characters, if that doesn't do exactly what you need, it at least gives you something to start with

ja08prat
  • 154
  • 10
0

Probably the simplest approach would be to look for characters in the range 0x600-0x6FF in the string. You should be able to do this with a regexp replace along the lines of

myString.replaceAll([^\\p{IsArabic}]);

(untested, and this requires Java 7 or later) but this would remove any characters from the string which aren't Arabic. Otherwise you'd need to replace \p{...} with \\x{600}-\\x{6ff} assuming I'm remembering my regexp hex syntax correctly.

Don Hosek
  • 981
  • 6
  • 23
  • It's supposed to be `IsArabic`, not `isArabic`. More info at http://docs.oracle.com/javase/tutorial/essential/regex/unicode.html I've updated my answer to reflect that. – Don Hosek Jul 18 '17 at 20:48
  • It cannot be compiled. –  Jul 18 '17 at 20:51