In python 3, how do I print the character '\u2212' (a minus sign "−") without UnicodeEncodeError?

Question

I need to process some Excel files which contains lots of "−" ('\u2212'), as well as other characters. After lots of trying, I can't even print it on screen, or save it to a file:

a='−'
print(a.encode('utf-8')) # print b'\xe2\x88\x92'
print(a)     # raise UnicodeEncodeError: 'gbk' codec can't encode character '\u2212' in position 0: illegal multibyte sequence
with open('test.txt','w') as file:
    file.write(a)      # raise UnicodeEncodeError: 'gbk' codec can't encode character '\u2212' in position 0: illegal multibyte sequence

In this page: https://docs.python.org/3.4/howto/unicode.html, it replace it with some other characters, but I have to print it, or at least write it to a file properly:

>>> u = chr(40960) + 'abcd' + chr(1972)
>>> u.encode('utf-8')
b'\xea\x80\x80abcd\xde\xb4'
>>> u.encode('ascii')  
Traceback (most recent call last):
    ...
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in
  position 0: ordinal not in range(128)
>>> u.encode('ascii', 'ignore')
b'abcd'
>>> u.encode('ascii', 'replace')
b'?abcd?'
>>> u.encode('ascii', 'xmlcharrefreplace')
b'&#40960;abcd&#1972;'
>>> u.encode('ascii', 'backslashreplace')
b'\\ua000abcd\\u07b4'

How can I do it?

Start with a console that can actually handle a wider range of Unicode characters. Your console is configured for GBK only. — Martijn Pieters, Jul 29 '15 at 20:57
To write to a file, specify a codec that can handle the specific Unicode codepoints. `open('test.txt', 'w', encoding='utf8')` for example. — Martijn Pieters, Jul 29 '15 at 20:57
@MartijnPieters I did use `-m idlelib -r` in the interpreter option to pop up a IDLE shell, which temporarily fix this problem, but is it possible to print it in pycharm console? — liyuanhe211, Jul 29 '15 at 21:39
See https://www.jetbrains.com/pycharm/help/configuring-output-encoding.html for the PyCharm console configuration. — Martijn Pieters, Jul 30 '15 at 07:10

In python 3, how do I print the character '\u2212' (a minus sign "−") without UnicodeEncodeError?

0 Answers0