I am writing a dictionary app. If a user types an Unicode character I want to check which language the character is.
e.g.
字 - returns ['zh', 'ja', 'ko']
العربية - returns ['ar']
a - returns ['en', 'fr', 'de'] //and many more
й - returns ['ru', 'be', 'bg', 'uk']
I searched and found that it could be done with CLDR https://stackoverflow.com/a/6445024/41948
Or Google API Python - can I detect unicode string language code?
But in my case
- Looking up a large charmap db seems cost a lot of storage and memory
- Too slow to call an API, besides it requires a network connection
- don't need to be very accurate. just about 80% correct ratio is acceptable
- simple & fast is the main requirement
- it's OK to just cover UCS2 BMP characters.
Any tips?
I need to use this in Python and Javascript. Thanks!