2

It must be a trivial task but I can't handle it. I have json that looks like this.

        {'
          city': u'\\u0410\\u0431\\u0430\\u043a\\u0430\\u043d', 
        'language':{
          u'\\u0410\\u043d\\u0433\\u043b\\u0438\\u0439\\u0441\\u043a\\u0438\\u0439': 5608,      
          u'\\u0418\\u0442\\u0430\\u043b\\u044c\\u044f\\u043d\\u0441\\u043a\\u0438\\u0439': 98
        }
    },

I'm trying to convert the unicode strings into utf-8.

string=u'\u0410\u0431\u0430\u043a\u0430\u043d'
string.encode('utf-8')

I've got

'\xd0\x90\xd0\xb1\xd0\xb0\xd0\xba\xd0\xb0\xd0\xbd'

Instead of:

u'Абакан'

What am I doing wrong?

mailman_73
  • 778
  • 12
  • 29
  • You're seeing the `repr` of the byte string, which doesn't try to show the actual characters. Try `print`ing it. – Mark Ransom Mar 22 '16 at 04:43
  • Your sample JSON is not really JSON. If those are Python values, you have double-encoded `\u` unicode escapes in those unicode strings. Is that really what you have or did you type this out by hand? – Martijn Pieters Mar 22 '16 at 04:49
  • Your `string` sample value is a proper `unicode` object, which you could just directly print (`print string`), and the same applies to the encoded value (`print string.encode('utf8')`). You are getting confused by the *string representation* echoed by the Python interactive interpreter or used to show the contents of containers like a dictionary or a list. Representations are ASCII-safe debugging values. – Martijn Pieters Mar 22 '16 at 04:51

2 Answers2

3

What am I doing wrong?

Not printing it.

When you just evaluate a string in Python REPL, you will get its repr. This is '\xd0\x90\xd0\xb1\xd0\xb0\xd0\xba\xd0\xb0\xd0\xbd'. When you print it, you will get Абакан.

print(string.encode('utf-8'))
Amadan
  • 191,408
  • 23
  • 240
  • 301
  • that was... silly) Thank you. Even though, I've got another problem when I try to save the json as file I've got `u'\\u0410\\u043d\\u0433\\u043b\\u0438\\u0439\\u0441\\u043a\\u0438\\u0439'` How to convert it into cyrillic letters? `with open('file_to_save.txt', 'w') as outfile: json.dump(json_var, outfile)'` – mailman_73 Mar 22 '16 at 05:04
  • `u'\\u0410\\u043d\\u0433\\u043b\\u0438\\u0439\\u0441\\u043a\\u0438\\u0439'` is not valid JSON. (note the quotes.) `u'"\\u0410\\u043d\\u0433\\u043b\\u0438\\u0439\\u0441\\u043a\\u0438\\u0439"'` is valid JSON. With this latter value, `import json; print(json.loads(x))` prints `Английский`. – Amadan Mar 22 '16 at 05:33
1

As @Amadan said, you just need to print your string.

But why printing string resolves the problem?

The answer is that if you type string + Enter this will lead to display the representation of repr() the of the object string; while running print string (or print (string) in Python 3.x) you will get a human readable string representation -str()- of string.

>>> converted = string.encode('utf8')
>>> converted
'\xd0\x90\xd0\xb1\xd0\xb0\xd0\xba\xd0\xb0\xd0\xbd'
>>> print converted
Абакан
>>> print repr(converted)
'\xd0\x90\xd0\xb1\xd0\xb0\xd0\xba\xd0\xb0\xd0\xbd'
>>> print str(converted)
Абакан
>>> 

Further reading: Difference between __str__ and __repr__ in Python

Community
  • 1
  • 1
Billal Begueradj
  • 20,717
  • 43
  • 112
  • 130
  • `str` of string gives the string itself; `print(str(x))` == `print(x)`, and `print(repr(x))` is pretty much the same as `str(x)` in REPL (as `str(x)` will be again displayed using `repr`). – Amadan Mar 22 '16 at 05:30