1

I have one question about assigning one dictionary value to another dictionary value in python, the value contains some Chinese characters

# -*- coding: utf-8 -*- 
import string
a = {}
a['1'] = '大' # chinese character
b = {}
b['1'] = a['1']
print a['1']
print a
print b

And the printout is

大
{'1': '\xe5\xa4\xa7'}
{'1': '\xe5\xa4\xa7'}

Why there is a difference between a and a['1']? How to make print a be {'1': '大'}?

zhengyu
  • 595
  • 3
  • 5
  • 20
  • You created a UTF-8 byte string, which means that any `repr()` output will be shown using escape sequences for non-ASCII bytes. See the duplicate where this is explained in more detail. Don't print out a whole dictionary, print out individual string values. – Martijn Pieters Jan 16 '16 at 17:56
  • I thikn it would be useful to explain what `print` [actually does](http://stackoverflow.com/questions/1979234/what-does-python-print-function-actually-do) – Pynchia Jan 16 '16 at 17:58

1 Answers1

1

Why there is a difference between a and a['1']?

The first (a) is your character wrapped in a dictionary. When you print a dictionary, Python will print the raw bytes of the character (\xe5\xa4\xa7) which is it's UTF-8 encoding. When you print the string directly using print a[1], then Python will decode these 3 bytes into their respective character.

How to make a['1'] and b['1'] be 大?

They are already the same. Just do

print a['1']
print b['1']

Add the following class to dump your dictionary as you expect. Note that this code assumes all your strings are in UTF-8 format.

class MyPrettyPrinter(pprint.PrettyPrinter):
    def format(self, object, context, maxlevels, level):
        if isinstance(object, str):
            return (object.decode('utf8'), True, False)
        return pprint.PrettyPrinter.format(self, object, context, maxlevels, level)

MyPrettyPrinter().pprint(a)  # {1: 大}
Martin Konecny
  • 57,827
  • 19
  • 139
  • 159