2

It feels like a a very basic question and I've already read through the docs and answers, that suggest my code should be working. This may really be a duplicate and I missed something, please, drop a link as I've already spent a couple of hours on this and it feels silly. Thank you in advance.

Python 3.6 code:

import json
print( json.loads( '{"text": \"\\u0444\\u044b\\u0432\\u0430\"}' ) )

Produces the following error:

Traceback (most recent call last):
  File "test2.py", line 28, in <module>
    print( json.loads( '{"txt": \"\\u0444\\u044b\\u0432\\u0430\"}' ) )
UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-12: ordinal not in range(128)

I had an impression that I can do

jsn = json.dumps( my_dict )
# and later call
json.loads( jsn )

And I will get back my original dict, but this doesn't work for unicode characters.

Can I convert a string containing unicode characters back to python dict with json.loads ? Please, provide a minimum working code for my example.

sr9yar
  • 4,850
  • 5
  • 53
  • 59

2 Answers2

6

The problem is not your code but your encoding settings of stdout.

The reason is the same as the following code:

import json
print(str(json.loads( '{"text": \"\\u0444\\u044b\\u0432\\u0430\"}' )).encode('ascii'))

print use stdout for default output, so your encoding settings of stdout is not "utf-8". To check the encoding settings, type following code

import sys; print(sys.stdout)

you will get something like this:

<_io.TextIOWrapper name='' mode='w' encoding='ANSI_X3.4-1968'>

the content of encoding is not utf-8; to change that, reopen stdout with utf-8 encoding.

import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="UTF-8")

now the print should work.

but the deeper solution for this problem is to correct the environment settings of your system.

I have a solution for linux. type locale -a in your terminal, the language settings of your current system will be listed, let we choose a "utf-8" encoded language.

export LANG=en_US.UTF-8 # for english user

or

export LANG=zh_CN.UTF-8 # 给中文用户

you may need to add it into .bashrc to make sure it always effective.

DonYorkDon
  • 76
  • 3
  • Originally, I was looking for a place, where json file encoding issue happens in my code, so I thought I could quickly print some variables, and that's where it led me astray. Basically I was expecting this error, but from json library. And it came, but from print xD Your answer was very helpful, thank you. – sr9yar Dec 17 '18 at 10:10
  • setting sys.stdout with "TextIOWrapper" works like a charm! Thank you.. – Yiğit Feb 29 '20 at 21:07
2

The problem is with your sys.stdout stream encoding, not with json.loads() - your code snippet works ok for me (using python 3.6.6 and a properly configured env). FWIW you could have found out by yourself by splitting the print() from the json.loads().

TL;DR: check your OS doc for how to properly set the stdout encoding.

bruno desthuilliers
  • 75,974
  • 6
  • 88
  • 118