I tried Python's unicodedata
module mentioned by nneonneo in his answer and I think it probably works.
>>> import unicodedata
>>> unicodedata.name('你')
'CJK UNIFIED IDEOGRAPH-4F60'
>>> unicodedata.name('桜')
'CJK UNIFIED IDEOGRAPH-685C'
>>> unicodedata.name('あ')
'HIRAGANA LETTER A'
>>> unicodedata.name('ア')
'KATAKANA LETTER A'
>>> unicodedata.name('a')
'LATIN SMALL LETTER A'
As you see, both Chinese characters and Japanese adopted Chinese characters are categorized to CJK UNIFIED IDEOGRAPH
and hiragana and katakana correctly recognized. I didn't test Korean characters but I think they should fall into CJK UNIFIED IDEOGRAPH
, too.
Also, if you only care about if it's a CJK character/letter or not, it seems this is simpler:
>>> import unicodedata
>>> unicodedata.category('你')
'Lo'
>>> unicodedata.category('桜')
'Lo'
>>> unicodedata.category('あ')
'Lo'
>>> unicodedata.category('ア')
'Lo'
>>> unicodedata.category('a')
'Ll'
>>> unicodedata.category('A')
'Lu'
According to here, Ll
is lowercase, Lu
is uppercase and Lo
is other.