3

Let me explain my question by some examples;

                   // expected result: ("true" means "rlt" and "false" means "ltr")
var test = "..!";  // true
var test = "te";   // false
var test = "!te";  // false
var test = "..ق";  // true
var test = "مب";   // true 
var test = "eس";   // false
var test = "سe";   // true

Here is my current code:

// declare direction of comment in textarea
var x = new RegExp("[A-Za-z]"); // is ascii
var isAscii = x.test($("#textarea-edit-"+post_id_for_edit).val().substring(0, 1));
if(isAscii){
     $("#textarea-edit-"+post_id_for_edit).css("direction", "ltr");
} else {
     $("#textarea-edit-"+post_id_for_edit).css("direction", "rtl");
}

I want it be based on the first character which is a letter (either Persian or English). But my code is based on the first character (it can be anything, even a sign).

Well how can I do that?

Martin AJ
  • 6,261
  • 8
  • 53
  • 111
  • Check https://jsfiddle.net/22uovqhc/. Actually, I have doubts as for the Persian letter regex, there are suggestions to use `[\u0600-\u06FF]`, or even `[\u0600-\u06FF\uFB8A\u067E\u0686\u06AF]` – Wiktor Stribiżew Aug 14 '16 at 23:36
  • Do you mean the letters in the [Persian alphabet](https://en.wikipedia.org/wiki/Persian_alphabet) in Arabic script? (This is similar to Hindi being written in Devanagari script.) Of course, many languages (including English) use letters that are not in their alphabet, so it's best not to focus too narrowly on just the alphabet of a language. – Tom Blodget Aug 14 '16 at 23:57
  • Check this answer which is the only working/complete example with a bunch of unit tests: https://stackoverflow.com/a/66372216/12666332 – SeyyedKhandon Jul 14 '22 at 06:19

3 Answers3

7

I suggest using a regex with ASCII letter and Persian letter regexps as alternation parts, and only capture one of them (say, ASCII). If there is a match, and Group 1 was matched, the text is identified as ASCII. If there was no match, or the match was a success, but Group 1 did not match, the text should be Persian.

See the code below:

function check(s) {
  var PersianOrASCII = /[آ-ی]|([a-zA-Z])/;
  if ((m = s.match(PersianOrASCII)) !== null) {
    if (m[1]) {
       return false;
    }
    else { return true; }
  }
  else { return true; }
}
  
console.log(check("..!"));  // true
console.log(check("te"));   // false
console.log(check("!te"));  // false
console.log(check("..ق"));  // true
console.log(check("مب"));   // true 
console.log(check("eس"));   // false
console.log(check("سe"));   // true

NOTE: You may fine tune the Persian letter regex using [\u0600-\u06FF], or even [\u0600-\u06FF\uFB8A\u067E\u0686\u06AF] regexps. Or even [\u06A9\u06AF\u06C0\u06CC\u060C\u062A\u062B\u062C\u062D\u062E\u062F\u063A\u064A\u064B\u064C\u064D\u064E\u064F\u067E\u0670\u0686\u0698\u200C\u0621-\u0629\u0630-\u0639\u0641-\u0654] (from persianRex).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • A pretty similar [question](http://stackoverflow.com/questions/38965825/how-can-i-change-the-direction-of-textarea-when-there-is-persian-character) .. You might want to take a look at it. – Martin AJ Aug 16 '16 at 02:56
4

The Persian characters are within the Arabic Unicode block, between U+0600 and U+06FF.

function contain_persian_char(str){
    var p = /^[\u0600-\u06FF\s]+$/;

    if (p.test(str)) 
        return true;
    return false;
}

Also you can use this library simply: persianRex

Ramin Esfahani
  • 190
  • 1
  • 6
0

You can simply use this regex check which is originally available at perisan-tools:

 const faAlphabet = "ابپتثجچحخدذرزژسشصضطظعغفقکگلمنوهی";
 const faNumber = "۰۱۲۳۴۵۶۷۸۹";
 const faShortVowels = "َُِ";
 const faOthers = "‌آاً";
 const faMixedWithArabic = "ًٌٍَُِّْٰٔءك‌ةۀأإيـئؤ،";
 const faText = faAlphabet + faNumber + faShortVowels + faOthers;
 const faComplexText = faText + faMixedWithArabic;

const isPersian = (str, isComplex = false, trimPattern = /["'-+()؟\s.]/g) => {
    const text = str.replace(trimPattern, "");
    const faRegex = isComplex ? faComplexText : faText;
    return new RegExp(`^[${faRegex}]+$`).test(text);
};

console.log(isPersian("این یک متن فارسی است؟"));  // true
console.log(isPersian("آیا سیستم میتواند گزینه های دیگری را به اشتباه به عنوان متن فارسی تشخیص دهد؟"));   // true
console.log(isPersian("Lorem Ipsum Test")); // false
console.log(isPersian("これはペルシア語のテキストですか")); //false
console.log(isPersian("Это персидский текст?")); //false
console.log(isPersian("这是波斯文字吗?")); //false
console.log(isPersian("هل هذا نص فارسي؟")); //false
console.log(isPersian("أكد رئيس اللجنة العسكرية الممثلة لحكومة الوفاق الوطني في ليبيا أحمد علي أبو شحمة، أن اللجنة لا تستطيع تنفيذ خطتها لإخراج العناصر الأجنبية من أراضي البلاد.")); //false

Update

I know it may be complicated, but it is the right way to detect persian correctly, to see that the accepted answer is not working correctly, just test it with the options which we put below as test.

SeyyedKhandon
  • 5,197
  • 8
  • 37
  • 68