Is there a programmatic way to identify the tones in Chinese text?
For an input string like 苹果
I need to extract the tones as 2
and 3
as it would be indicated in the pinyin transliteration píng guǒ
or ping2 guo3
.
I guess a possible workaround would be converting Chinese hanzi text to pinyin (e.g. with pinyin4j) and then extract the tones from pinyin, but I assume there must be a better and direct way to do it.
Context
The question is about if there is some algorithmic way to identify the tones or if the only way is a map lookup against an authoritative source e.g. the publicly available CEDICT database.