All your outputs are normal. By the way, this:
reload(sys)
sys.setdefaultencoding('utf8')
is really a poor man's trick to set the Python default encoding. It is seldom really useful - IMHO it is not in shown code - and should only be used when no cleaner way is possible. I had been using Python 2 for decades with non ascii charset (Latin1) and only used that in my very first scripts.
And the # -*- coding: utf-8 -*-
is not used either by Python here, though it may be useful for your text editor: it only makes sense when you have unicode literal strings in your script - what you have not.
Now what really happens:
You define row
as a 2 tuple of (byte) strings containing chinese characters encoded in utf8. Fine.
When you print a string, the characters are passed directly to the output system (here a terminal or screen). As it correctly processes UTF8 it converts the utf8 byte representation into the proper characters. So print (row[0])
(which is executed as print row[0]
in Python 2 - (row[0])
is not a tuple, (row[0],)
is a 1-tuple) correctly displays chinese characters.
But when you print a tuple, Python actually prints the representation of the elements of the tuple (it would be the same for a list, set or map). And in Python 2, the representation of a byte or unicode string encodes all non ASCII characters in \x..
of \u....
forms.
In a Python interactive session, you should see:
>>> print rows[0]
已
>>> print repr(rows[0])
'\xe5\xb7\xb2'
TL/DR: when you print containers, you actually print the representation of the elements. If you want to display the string values, use an explicit loop or a join:
print '(' + ', '.join(rows) + ')'
displays as expected:
(已, 经激活的区域语言)