2

I'm just starting here, I hope I follow all the rules. I have a dictionary with its key-value pairs as {..., 'CL': 'León', ...} matching the abbreviation of Spanish regions (i.e. 'CL') to one city in each (i.e. León). I included the line

# _*_ coding: utf-8 _*_

at the beginning so I was able to use utf characters like tildes. The thing is that when I print out individual values everything goes well and the output includes the tildes correctly:

print cities['CL']

However when I print out the whole dictionary as:

print cities

I've got double-byte hex characters, in this case \xc3\xb3.

Why is this? Thanks in advance.

Thomas Fritsch
  • 9,639
  • 33
  • 37
  • 49
Borja G.
  • 29
  • 1
  • Interesting issue my friend , checkout https://stackoverflow.com/questions/8288551/how-do-i-display-non-english-characters-in-python – Ubdus Samad Aug 08 '17 at 16:15
  • 1
    Possible duplicate of [How do I display non-english characters in python?](https://stackoverflow.com/questions/8288551/how-do-i-display-non-english-characters-in-python) – DYZ Aug 08 '17 at 16:16
  • And this as well : https://stackoverflow.com/questions/27814363/how-to-write-dict-into-file-with-characters-other-than-english-letters-in-python – Ubdus Samad Aug 08 '17 at 16:16
  • Possible duplicate of [How to write dict into file with characters other than English letters in python 2.7.8?](https://stackoverflow.com/questions/27814363/how-to-write-dict-into-file-with-characters-other-than-english-letters-in-python) – Ubdus Samad Aug 08 '17 at 16:17
  • I'd say `repr(mydict).decode("unicode-escape")` is the way to go https://stackoverflow.com/a/5648769/1328439 – Dima Chubarov Aug 08 '17 at 16:22
  • 1
    Unrelated: people generally use `-*-`, not `_*_`. This convention comes from Emacs. Python doesn't care as long as it founds `coding:` or `coding=`, so it does not really make a difference, this is just FYI – Andrea Corbellini Aug 08 '17 at 17:15

1 Answers1

0

For byte strings there is another python specific encoding: string-escape

>>> d = {'a' : 'тест', 'b': 'тост'}
>>> print repr(d).decode('string_escape')
{'a': 'тест', 'b': 'тост'}

Python 2 print statement invokes conversion of its argument to string. By default when this value contains non-ascii characters they are escaped.

We can obtain this representaion from the repr() built-in and unescape borrowing from another SO post.

However, if the values are proper unicode strings, for example string constants prefixed with u, their representation and decoding would be different.

>>> d = {'a' : u'тест', 'b': u'тост'}
>>> print d
{'a': u'\u0442\u0435\u0441\u0442', 'b': u'\u0442\u043e\u0441\u0442'}
>>> print repr(d).decode("unicode-escape")
{'a': u'тест', 'b': u'тост'}
Dima Chubarov
  • 16,199
  • 6
  • 40
  • 76