\b
word boundary is still matching only in ASCII only contexts even when you define unicode:true
whose main point is to make sure "UTF-16 surrogate pairs in the original string will be treated as a single code point and will not match separately".
You may "decompose" the word boundary and add Arabic letter and digit ranges to the class:
String arabicText = "لن تجد الطبيعة الجديدة راحتها";
String arabicFind="الطبيعة";
RegExp arabicExp = new RegExp("(?:^|[^a-zA-Z0-9_\\u06F0-\\u06F9\\u0622\\u0627\\u0628\\u067E\\u062A-\\u062C\\u0686\\u062D-\\u0632\\u0698\\u0633-\\u063A\\u0641\\u0642\\u06A9\\u06AF\\u0644-\\u0648\\u06CC\\u202C\\u064B\\u064C\\u064E-\\u0652])$arabicFind(?![a-zA-Z0-9_\\u06F0-\\u06F9\\u0622\\u0627\\u0628\\u067E\\u062A-\\u062C\\u0686\\u062D-\\u0632\\u0698\\u0633-\\u063A\\u0641\\u0642\\u06A9\\u06AF\\u0644-\\u0648\\u06CC\\u202C\\u064B\\u064C\\u064E-\\u0652])", unicode:true);
bool arabicResult = arabicExp.hasMatch(arabicText);//does not match
print(arabicResult); // => true
The regex will match an $arabicFind
word when it is
(?:^|[^a-zA-Z0-9_\u06F0-\u06F9\u0622\u0627\u0628\u067E\u062A-\u062C\u0686\u062D-\u0632\u0698\u0633-\u063A\u0641\u0642\u06A9\u06AF\u0644-\u0648\u06CC\u202C\u064B\u064C\u064E-\u0652])
- preceded with start of string (^
) or (|
) any char but ASCII letter, digit or _
and Farsi letters or digits
(?![a-zA-Z0-9_\u06F0-\u06F9\u0622\u0627\u0628\u067E\u062A-\u062C\u0686\u062D-\u0632\u0698\u0633-\u063A\u0641\u0642\u06A9\u06AF\u0644-\u0648\u06CC\u202C\u064B\u064C\u064E-\u0652])
- not followed with an ASCII letter, digit or _
and Farsi letters or digits.