-2

I have a web scraper application that scraps some Japanese site. The site has UTF-8 encoded Japanese characters. For example,

2017-03-02 17:14:17,862 - __main__ - DEBUG - 出演者: 青山茉利奈
2017-03-02 17:14:17,862 - __main__ - DEBUG - 作者: ひつき
2017-03-02 17:14:17,862 - __main__ - DEBUG - 収録時間: 123分

As you can see, when I do logger.debug() in the code, the characters are printed on screen correctly. But when I use json.dump() to dump this data in a json text file, the strings are encoded to something like

"\u53ce\u9332\u6642\u9593": "123\u5206",

This is not what I want. What I want is exactly what I see in the debug log. How can I solve this problem?

fhcat
  • 971
  • 2
  • 9
  • 28
  • This has been answered on the site before, here: http://stackoverflow.com/a/18337754/1759987 – Aaron Mar 02 '17 at 22:51
  • 1
    The solution in this link works. I dump the object to a json string, and then save it to a file. – fhcat Mar 02 '17 at 23:09
  • 1
    Possible duplicate of [Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence](http://stackoverflow.com/questions/18337407/saving-utf-8-texts-in-json-dumps-as-utf8-not-as-u-escape-sequence) – Zero Piraeus Mar 02 '17 at 23:43

1 Answers1

-1
json.dumps(whatever, ensure_ascii=False)

Specify ensure_ascii=False to disable \u escaping. Note that if the presence of this escaping is actually causing you problems, whatever code needs to receive this JSON is broken.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Thanks, now I get `UnicodeEncodeError: 'ascii' codec can't encode characters in position 4-26: ordinal not in range(128)` – fhcat Mar 02 '17 at 23:02