I have:
- a file
file.txt
containing just one character:♠
, and UTF-8 encoded. a CP-1252 encoded Python script
test.py
containing:import codecs text = codecs.open('file.txt', 'r', 'UTF-8').read() print('text: {0}'.format(text))
When I run it in Eclipse 4.7.2 on Windows 7 SP1 x64 Ultimate and with Python 3.5.2 x64, I get the error message:
Traceback (most recent call last):
File "C:\eclipse-4-7-2-workspace\SEtest\test.py", line 3, in <module>
print('text: {0}'.format(text))
File "C:\programming\python\Python35-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2660' in position 6: character maps to <undefined>
My understanding is that the issue stems from the fact that on Microsoft Windows, by default the Python interpreter uses CP-1252 as its encoding and therefore has is with the character ♠
.
Also, I would note at that point that I kept Eclipse default encoding, which can be seen in Preferences > General > Workspace
:
When I change the Python script test.py
to:
import codecs
print(u'♠') # <--- adding this line is the only modification
text = codecs.open('file.txt', 'r', 'UTF-8').read()
print('text: {0}'.format(text))
then try to run it, I get the error message:
(note: Eclipse is configured to save the script whenever I run it).
After selecting the option Save as UTF-8
, I get the same error message:
Traceback (most recent call last):
File "C:\Users\Francky\eclipse-4-7-2-workspace\SEtest\test.py", line 2, in <module>
print(u'\u2660')
File "C:\programming\python\Python35-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2660' in position 0: character maps to <undefined>
which I think is expected since the Python interpreter still uses CP-1252.
But if I run the script again in Eclipse without any modification, it works. The output is:
♠
text: ♠
Why does it work?