I have a small Python script that loops through a text file containing traditional and simplified Chinese characters as well as their associated pinyin and English translations and stores them into a JSON object.
Here's the script -
import json
# resultant dictionary
dict = {}
# fields in the sample file
fields = ['traditional', 'simplified', 'pinyin', 'english']
with open('cedict.txt', encoding = 'utf8') as fh:
# looping logic
# creating json file
new_file = open("cedict.json", "w")
json.dump(dict, new_file, indent = 4)
new_file.close()
Here's a small snippet of the JSON object -
"word93428": {
"traditional": "\u86e7",
"simplified": "\u86e7",
"pinyin": "[wang3]",
"english": "/old"
}
The text file is being encoded with utf8 which seems to work fine with Latin-based characters but not Chinese ones.
I've played around with other character encodings and they all yield different errors so the easier solution seems to be to loop through the JSON object and decode the Chinese characters so they look how they should.
This is what I'm stuck on.
I've tested out the decode()
function on one of the encoded Chinese characters and it will make the character appear in it's original form.
But I need to loop through an entire JSON object with thousands of translations and only decode the first 2/4 key/value pairs.
How can I achieve this?