4

I have a webview which will load a string from a url, I'm not sure if this is the correct way or not but what I want to do is to check if the string is in persian so I change my webview's text alignment to rtl and else if it's in english change it to ltr. Is it possible to determine if the string is in persian or english? or if there's any other better way to handle this matter ?

Thanks in advance.

arash moeen
  • 4,533
  • 9
  • 40
  • 85

5 Answers5

12

Try the following regular expression, to check the Arabic, Persian and Hebrew characters range.

public static final Pattern RTL_CHARACTERS = 
    Pattern.compile("[\u0600-\u06FF\u0750-\u077F\u0590-\u05FF\uFE70-\uFEFF]");
Matcher matcher = RTL_CHARACTERS.matcher("براي تست");
if(matcher.find()){
   return true;  // it's RTL
} 
VahidN
  • 18,457
  • 8
  • 73
  • 117
1

Try Persian-tools an awesome javascript library for this matter and also many other useful functionalities.

import { isPersian, toPersianChars } from "persian-tools2";

isPersian("این یک متن فارسی است؟") // true
isPersian("Lorem Ipsum Test") // false
Majid Shahabfar
  • 4,010
  • 2
  • 28
  • 36
0

Here are methods which are explained Language recognition in Java

What you could do is just check if the string is in English, if not it should be Persian.

TextCat: http://textcat.sourceforge.net/

Community
  • 1
  • 1
letsjak
  • 359
  • 3
  • 14
  • what if the string is a combination of both language? for example most of it is written in persian, but it has few english words which happens alot. – arash moeen Aug 19 '14 at 10:44
  • split the string to an array by whitespace where every array element is a word of your sentence. And then you check every array element. check here: http://stackoverflow.com/questions/4674850/converting-a-sentence-string-to-a-string-array-of-words-in-java – letsjak Aug 19 '14 at 10:46
  • strangely it detects "hello my name is kia gallery, what I do is creating amazing jewelry mostly in gold " as hungarian language, any thoughts ? – arash moeen Aug 19 '14 at 11:00
  • 1
    maybe search for another API instead of textcat http://www.basistech.com/text-analytics/rosette/language-identifier/ or https://code.google.com/p/language-detection/ – letsjak Aug 19 '14 at 11:01
0

in java there is language detection library to detect the language. i think it may help you. try it.

you need to import the following library files to work with this.

import com.cybozu.labs.langdetect.Detector;
import com.cybozu.labs.langdetect.DetectorFactory;
import com.cybozu.labs.langdetect.Language;

for more reference click here

senthil
  • 11
  • 5
0

Thanks to the accepted Answer:
For English chars, numbers, space and Persian, Arabic, Hebrew chars with a specific length you can use the below regular expression pattern:

//MAX_LENGTH = maximum allowable length of string
var pattern = /^[a-zA-Z0-9-\u0600-\u06FF\u0750-\u077F\u0590-\u05FF\uFE70-\uFEFF ]{2,MAX_LENGTH}$/;

javascript example:

function check_En_Numbers_space_Persian_Arabic_Hebrew(str) {
    var pattern = /^[a-zA-Z0-9-\u0600-\u06FF\u0750-\u077F\u0590-\u05FF\uFE70-\uFEFF ]{2,100}$/;
    return pattern.test(str.trim());
}

Ghasem Sadeghi
  • 1,734
  • 13
  • 24