0

I don't know the title that suits my situation.

data=[]

for title in titles:
     real_title = ''.join(str(title.text).split())
     print(real_title)
     data.append(real_title)

with open(os.path.join(BASE_DIR, 'result.json'), 'w+') as json_file:
    json.dump(data, json_file)

when python ~~.py, result is very well. print(real_title) show

이스케이프룸

but when I open json_file:

"\uc774\uc2a4\ucf00\uc774\ud504\ub8f8"

What's the problem? Why utf-8 letter is literally saved to utf-8?

1 Answers1

3

What you are seeing are the Unicode escape codes; for example, "\uc774" is the character with Unicode code point C774 16-bit hex. This escape will occur by default for any character not in the ASCII range, 0 thru 127 decimal.

You can set the ensure_ascii parameter to False:

If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.

Example:

>>> import json
>>> data = {"key": "이스케이프룸"}
>>> json.dumps(data)
'{"key": "\\uc774\\uc2a4\\ucf00\\uc774\\ud504\\ub8f8"}'
>>> json.dumps(data, ensure_ascii=False)
'{"key": "이스케이프룸"}'
Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
  • Wow you are genius... Thank you. `json.dump(data, json_file, ensure_ascii=False)` solve my problem! –  Mar 18 '19 at 14:37