How to find out the languages of a given character?

Question

How to find the Languages of the given Character,(offline)?. For Example in the list(mytext), I have three characters, the first one is in English, the second one is in Hindi and the third one is in "Tamil". I try to detect the Languages of the character using langdetect package. But It produces irrelevant results. How to get the exact result? (In my case "ta"- "Tamil" (the third one) is correct. The other two are wrong)

mytext =["B","उ","பு"]
from langdetect import detect_langs,detect
print(detect(mytext[0]))
print(detect(mytext[1]))
print(detect(mytext[2]))

Result

tr
ne
ta

langdetect is an AI language detection algorithm. Letters don't have languages, they have scripts. How do you expect any computer to figure out that 'B' is English and not French, German, Spanish, Italian, Rhaeto-Romance or Croatian? — Krateng, Jun 13 '22 at 15:56
You can't really determine language from a single character, since that character may be used in several different languages. You need whole words or phrases. — Barmar, Jun 13 '22 at 15:57
You realize many alphabets are shared among many languages, right? There is no single answer to the language of `'B'` (it's as much Turkish as it is English), nor `"उ"` (it is in fact Nepali too). `"பு"` is Tamil, but the same script is used by Saurashtra, Badaga, Irula and Paniya as well, they're just small minority languages that don't have ISO 639-1 two letter language codes AFAICT, so you got lucky. — ShadowRanger, Jun 13 '22 at 15:57
They're not wrong; Turkish uses "B", and Nepalese uses "उ". — chepner, Jun 13 '22 at 15:57
Either way, look at the docs https://github.com/Mimino666/langdetect you can use `detect_langs` to get a confidence score for each lang returned and either take the best one or make your own choice some other way (e.g. if confidence < 0.5 return an error and ask for longer input) — Anentropic, Jun 13 '22 at 16:01

How to find out the languages of a given character?

0 Answers0