I need to process some Excel files which contains lots of "−" ('\u2212'), as well as other characters. After lots of trying, I can't even print it on screen, or save it to a file:
a='−'
print(a.encode('utf-8')) # print b'\xe2\x88\x92'
print(a) # raise UnicodeEncodeError: 'gbk' codec can't encode character '\u2212' in position 0: illegal multibyte sequence
with open('test.txt','w') as file:
file.write(a) # raise UnicodeEncodeError: 'gbk' codec can't encode character '\u2212' in position 0: illegal multibyte sequence
In this page: https://docs.python.org/3.4/howto/unicode.html, it replace it with some other characters, but I have to print it, or at least write it to a file properly:
>>> u = chr(40960) + 'abcd' + chr(1972)
>>> u.encode('utf-8')
b'\xea\x80\x80abcd\xde\xb4'
>>> u.encode('ascii')
Traceback (most recent call last):
...
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in
position 0: ordinal not in range(128)
>>> u.encode('ascii', 'ignore')
b'abcd'
>>> u.encode('ascii', 'replace')
b'?abcd?'
>>> u.encode('ascii', 'xmlcharrefreplace')
b'ꀀabcd޴'
>>> u.encode('ascii', 'backslashreplace')
b'\\ua000abcd\\u07b4'
How can I do it?