1

So I am getting lost somewhere in converting unicode to utf-8. I am trying to define some JSON containing unicode characters, and writing them to file. When printing to the terminal the character is represented as '\u2606'. When having a look at the file the character is encoded to '\u2606', note the double backslash. Could someone point me into the right direction regarding these encoding issues?

# encoding=utf8

import json

data = {"summary" : u"This is a unicode character: ☆"}
print data

decoded_data = unicode(data)
print decoded_data

with open('decoded_data.json', 'w') as outfile:
    json.dump(decoded_data, outfile)

I tried adding the following snippet to the head of my file, but this had no success neither.

import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
Wouter
  • 652
  • 2
  • 7
  • 27
  • 1
    Do yourself a favour and switch to Python 3 if possible. In Python 2 the separation of unicode and encoded data is not strict and this may hide programming errors. – phobie Feb 14 '16 at 09:25
  • @phobie: ^ **bad** advice... *never* try to "hide" programming errors, as there shouldn't be any. – l'L'l Feb 14 '16 at 09:38

2 Answers2

1

First you are printing the representation of a dictionary, and python only uses ascii characters and escapes any other character with \uxxxx.

The same is with json.dump trying to only use ascii characters. You can force json.dump to use unicode with:

json_data = json.dumps(data, ensure_ascii=False)
with open('decoded_data.json', 'w') as outfile:
    outfile.write(json_data.encode('utf8'))
Daniel
  • 42,087
  • 4
  • 55
  • 81
0

I think you can also refer to this link.It is also really useful

Set Default Encoding

Community
  • 1
  • 1
Hardik Sachdeva
  • 195
  • 4
  • 12