The Unicode standard describes how characters are represented by code points and contains a lot of tables listing characters and their corresponding code points:
0061 'a'; LATIN SMALL LETTER A 0062 'b'; LATIN SMALL LETTER B
From https://docs.python.org/2/howto/unicode.html#definitions
In Python, a character has two different display forms, and the two forms are equal:
u'中文' == u'\u4e2d\u6587'
Apparently, human want to read u'中文'
instead of u'\u4e2d\u6587'
. But in some situations in Python2, unicode only display as unicode points:
>>> print(u'\u4e2d\u6587')
中文
>>> print({u'\u4e2d\u6587': 1})
{u'\u4e2d\u6587': 1}
>>> print([u'\u4e2d\u6587', 1])
[u'\u4e2d\u6587', 1]
But there is no problem in Python3
>>> print({u'\u4e2d\u6587': 1})
{'中文': 1}
>>> print([u'\u4e2d\u6587', 1])
['中文', 1]
Here are my questions:
- Can I tell Python which display form of unicode that I want?
- Why there's no problem with Python3?
- Is there a simple solution for Python2?
I haven't found a good solution in the following links: