27

I want to write a regular expression that matches each word in a sentence:

My regular expression:"\b(\w+)\b"

Result: RegExp matching Arabic image

While it works well with English Words. It does not work when using Arabic words. How could I accomplish the same feat for Arabic words?

Gherbi Hicham
  • 2,416
  • 4
  • 26
  • 41
KF2
  • 9,887
  • 8
  • 44
  • 77

3 Answers3

47

Try this:-

function HasArabicCharacters(text)
{
    var arregex = /[\u0600-\u06FF]/;
    alert(arregex.test(text));
} 

Arabic character set of list

[\u0600-\u06ff]|[\u0750-\u077f]|[\ufb50-\ufc3f]|[\ufe70-\ufefc]

Arabic script in Unicode:

As of Unicode 6.1, the Arabic script is contained in the following blocks:

Arabic (0600—06FF, 225 characters)
Arabic Supplement (0750—077F, 48 characters)
Arabic Extended-A (08A0—08FF, 39 characters)
Arabic Presentation Forms-A (FB50—FDFF, 608 characters)
Arabic Presentation Forms-B (FE70—FEFF, 140 characters)
Rumi Numeral Symbols (10E60—10E7F, 31 characters)
Arabic Mathematical Alphabetic Symbols (1EE00—1EEFF, 143 characters)

Contents are taken from wikipedia - Arabic script in Unicode

Siva Charan
  • 17,940
  • 9
  • 60
  • 95
  • 2
    An [updated regex](https://gist.github.com/rrshaban/fe18eb4bf3e2cff3a929) with full support, as laid out in the latest Unicode standard: `[\u0600-\u06ff]|[\u0750-\u077f]|[\ufb50-\ufbc1]|[\ufbd3-\ufd3f]|[\ufd50-\ufd8f]|[\ufd92-\ufdc7]|[\ufe70-\ufefc]|[\uFDF0-\uFDFD]`. Test it out [on Rubular](http://rubular.com/r/UC6fwxYhJ7) – Razi Shaban Jun 05 '15 at 01:05
5

I'd suggest this :

\p{InArabic}
vahidreza
  • 843
  • 1
  • 8
  • 19
  • I'd tested it and it was correct. As far as I know, it's an standard syntax in regex. For example, you can check more details [here](https://www.regular-expressions.info/unicode.html). – vahidreza Aug 07 '18 at 17:22
0

You can do it with function to translate Aracbic Characters list, Its very simple to do.

As Like :

function (regexStr) {
   regexStr = replace(regexStr,"ۿ","\u0600");
   regexStr = replace(regexStr,"؀","\u06FF");

   return regexStr;
}

Or in another idea replacing [alf] and [ya] to see your text direction correctly

var regexStr = "/[[alf]-[ya]]/";

 function (regexStr) {
   regexStr = replace(regexStr,"[alf]","\u0600");
   regexStr = replace(regexStr,"[ya]","\u06FF");

   return regexStr;
}