I have a text file where every line is a random combination of any of the following groups
Numbers - English Letters - Arabic Letters - Punctuation
\w which is composed of a-zA-Z0-9_ for the first 2 groups
\p{InArabic} for the third group
\p{Punct} which is composed of !"#$%&'()*+,-./:;<=>?@[]^_`{|}~ for the fifth group
I got this info from here
i read a line. The ONLY time I do something to this line is if the line contains Arabic letters AND (English letters OR Unicode Symbols)
After reading this post and this post I came up with the following expression. Obviously it's wrong as my output is all wrong >.<
pattern = Pattern.compile("(?=\\p{InArabic})(?=[a-zA-Z])");
Here's the input
1
1a
a!
aش
شa
ششa
aشش
شaش
aشa
!aش
The first three shouldn't be matched but my output shows that NONE are a match.
Edit: sorry I just realized that I forgot to change my title. But if any of you feel that searching is better performance wise then please suggest a search algorithm. Using search algo instead of regex looks ugly but I'd go with it if it performed better. Thanks to the posts I read, I learned that I can make regex faster if I put this in the constructor so that it'd be executed once only instead of including them in my loop thereby being executed everytime
pattern = Pattern.compile("(?=\\p{InArabic})(?=[a-zA-Z])");
matcher = pattern.matcher("");