i have a mixed text arabic , english , numbers & special charcters. How can i extract the arabic text only in java ?
Example :
مرحبا كيفك i'm fine and you كله تمام . كم عمرك . my age is 18
needed output :
مرحبا كيفك كله تمام كم عمرك
i have a mixed text arabic , english , numbers & special charcters. How can i extract the arabic text only in java ?
Example :
مرحبا كيفك i'm fine and you كله تمام . كم عمرك . my age is 18
needed output :
مرحبا كيفك كله تمام كم عمرك
The regular expression \p{InArabic}
matches any Arabic letter. The regular expression \s
matches any whitespace character. So if you only wish to see Arabic letters and spaces, you could use something like
myString.replaceAll("[^\\p{InArabic}\\s]", "");
to remove everything other than Arabic letters and whitespace.
write a regex statement that only accepts arabic characters. This one should get the job done : ^[\u0621-\u064A0-9 ]+$
it accepts all the arabic unicode characters, if that doesn't do exactly what you need, it at least gives you something to start with
Probably the simplest approach would be to look for characters in the range 0x600-0x6FF in the string. You should be able to do this with a regexp replace along the lines of
myString.replaceAll([^\\p{IsArabic}]);
(untested, and this requires Java 7 or later) but this would remove any characters from the string which aren't Arabic. Otherwise you'd need to replace \p{...}
with \\x{600}-\\x{6ff}
assuming I'm remembering my regexp hex syntax correctly.