I have a JSON file where strings are encoded in raw_unicode_escape
(the file itself is UTF-8). How do I parse it so that strings will be UTF-8 in memory?
For individual properties, I could use the following code, but the JSON is very big and manually converting every string after parsing isn't an option.
# Contents of file 'file.json' ('\u00c3\u00a8' is 'è')
# { "name": "\u00c3\u00a8" }
with open('file.json', 'r') as input:
j = json.load(input)
j['name'] = j['name'].encode('raw_unicode_escape').decode('utf-8')
Since the JSON can be quite huge, the approach has to be "incremental" and I cannot read the whole file ahead of time, save it in a string and then do some processing.
Finally, I should note that the JSON is actually stored in a zip file, so instead of open()
it's ZipFile.open()
.