5

I'm wondering is there any method to check a Chinese character is simplified Chinese or traditional Chinese in Python 3?

一二三
  • 21,059
  • 11
  • 65
  • 74
  • http://cjklib.org/0.3/library/cjklib.characterlookup.html seems to hold some promise but I'm not competent to write a useful answer from that. – tripleee Sep 12 '15 at 18:16
  • related: [What's the complete range for Chinese characters in Unicode?](http://stackoverflow.com/q/1366068/4279) – jfs Sep 13 '15 at 16:14

2 Answers2

6

cjklib does not support Python 3. In Python 3, you can use hanzidentifier.

import hanzidentifier

print(hanzidentifier.has_chinese('Hello my name is John.'))
》 False

print(hanzidentifier.has_chinese('Country in Simplified: 国家. Country in Traditional: 國家.'))
》 True

print(hanzidentifier.is_simplified('John说:你好!'))
》 True

print(hanzidentifier.is_traditional('John說:你好!'))
》 True
Blckknght
  • 100,903
  • 11
  • 120
  • 169
Hong Zher Tan
  • 61
  • 1
  • 3
1

You can use getCharacterVariants() in cjklib to query the character's simplified (S) and traditional (T) variants. As described in the Unihan database documentation, you can use this data to determine the classification for a character.

一二三
  • 21,059
  • 11
  • 65
  • 74