1

Is there any built in method or a library available in php to check what type of writing script/s is used in a given text (Latin, Arabic, Cyrillic, Devanagari...).

Note: I do not want to detect the language/s of the text.

I have searched and found only the following library:

https://github.com/LasseRafn/php-string-script-language

However, this library is useful when we already have some idea of the script. For example:

StringScript::isChinese('你好世界。')

What if the text is in multiple scripts or we have no idea of the writing script? (I cannot even recognize writing scripts of all world languages).

user934820
  • 1,162
  • 3
  • 17
  • 48
  • Does this answer your question? [Detect language from string in PHP](https://stackoverflow.com/questions/1441562/detect-language-from-string-in-php) – Sibidharan Nov 09 '22 at 02:01
  • This does not. There is a difference between a writing script and language. For example, Persian, Urdu, Punjabi, Arabic, Sindhi are written in Arabic script. Similarly English, French, German... are written in Latin script. – user934820 Nov 09 '22 at 02:06

1 Answers1

0

The IntlChar class can be used for this purpose, e.g.:

IntlChar::getBlockCode("A") === IntlChar::BLOCK_CODE_BASIC_LATIN
IntlChar::getBlockCode("Φ") === IntlChar::BLOCK_CODE_GREEK
IntlChar::getBlockCode("\u{2603}") === IntlChar::BLOCK_CODE_MISCELLANEOUS_SYMBOLS

See the aforelinked class definition for all the different code block constants. You can iterate over each character in a string and test if they all belong to the same code block; or whether most of them belong to the same code block, and/or ignore punctuation marks and the like. In the end you'll need to come up with your own definition of what "is Chinese" means and where the threshold is to non-Chinese. For instance, this excerpt from the Chinese Wikipedia is clearly Chinese, but also contains non-Chinese characters:

漢語又稱中文、華語[3]、唐話[4],概指由上古汉语(先秦雅言)发展而来...

deceze
  • 510,633
  • 85
  • 743
  • 889