I have a JSON file that contains /u
escaped unicode characters, however when I read this in Python, the escaped characters are seemingly incorrectly decoded as Latin-1 rather than UTF-8. Calling .encode('latin-1').decode('utf-8')
on the affected strings seems to fix this, but why is it happening, and is there a way to specify to json.load
that escape sequences should be read as unicode rather than Latin-1?
JSON file message.json
, which should contain a message composed of a "Grinning Face With Sweat" emoji:
{
"message": "\u00f0\u009f\u0098\u0085"
}
Python:
>>> with open('message.json') as infile:
... msg_json = json.load(infile)
...
>>> msg_json
{'message': 'ð\x9f\x98\x85'}
>>> msg_json['message']
'ð\x9f\x98\x85'
>>> msg_json['message'].encode('latin-1').decode('utf-8')
''
Setting the encoding
parameter in open
or json.load
doesn't seem to change anything, as the JSON file is plain ASCII, and the unicode is escaped within it.