0

Hello everyone below is given code I want to test string is in English or in Gujarati. But program giving wrong output how can I solve this? If the character is ASCII is from 0A80-0AFF this length then considers as Gujarati else consider as English.

Code:

if (!preg_match('/[^A-Za-z0-9]/', $Query)){
    echo 'English';
}
else{
    echo 'Gujarati';
}

Input:

A/4

Output:

Gujarati

Expected output:

English

1 Answers1

3

In a case where you have only, English and Gujarati, why don't you do it the other way around?

if (preg_match('/\x{0A80}-\x{0AFF}/u', $Query)){
    echo 'Gujarati';
}
else{
    echo 'English';
}

Basically if you have one character from Gujarati language it will be detected as Gujarati else it will be English. However note that ,ありがとう, élève, etc will also be considered as English

Have a look at this Unicode chart: https://unicode.org/charts/PDF/U0A80.pdf to define exactly the range that must be taken into account.

Explanations:

  1. \x{0A80}-\x{0AFF} to match characters between code points U+0A80 and U+0AFF
  2. /u for Unicode support in regex
Allan
  • 12,117
  • 3
  • 27
  • 51