Why does "Save as UTF-8" in Eclipse fix the Python UnicodeEncodeError?

Question

I have:

a file file.txt containing just one character: ♠, and UTF-8 encoded.

a CP-1252 encoded Python script test.py containing:

import codecs
text = codecs.open('file.txt', 'r', 'UTF-8').read()
print('text: {0}'.format(text))

When I run it in Eclipse 4.7.2 on Windows 7 SP1 x64 Ultimate and with Python 3.5.2 x64, I get the error message:

Traceback (most recent call last):
  File "C:\eclipse-4-7-2-workspace\SEtest\test.py", line 3, in <module>
    print('text: {0}'.format(text))
  File "C:\programming\python\Python35-32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2660' in position 6: character maps to <undefined>

My understanding is that the issue stems from the fact that on Microsoft Windows, by default the Python interpreter uses CP-1252 as its encoding and therefore has is with the character ♠.

Also, I would note at that point that I kept Eclipse default encoding, which can be seen in Preferences > General > Workspace:

When I change the Python script test.py to:

import codecs
print(u'♠') # <--- adding this line is the only modification
text = codecs.open('file.txt', 'r', 'UTF-8').read()
print('text: {0}'.format(text))

then try to run it, I get the error message:

(note: Eclipse is configured to save the script whenever I run it).

After selecting the option Save as UTF-8, I get the same error message:

Traceback (most recent call last):
  File "C:\Users\Francky\eclipse-4-7-2-workspace\SEtest\test.py", line 2, in <module>
    print(u'\u2660')
  File "C:\programming\python\Python35-32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2660' in position 0: character maps to <undefined>

which I think is expected since the Python interpreter still uses CP-1252.

But if I run the script again in Eclipse without any modification, it works. The output is:

♠
text: ♠

Why does it work?

Where do you get the output? Do you get the same output in Eclipse in the Console view, and on the command line? — howlger, Apr 11 '18 at 06:04
@howlger All ouputs were obtained in Eclipse in the Console view. — Franck Dernoncourt, Apr 11 '18 at 06:05
@howlger If I run the UTF-8 encoded python script outside Eclipse, in cmd.exe, I get the same `UnicodeEncodeError` as when I was writing the corresponding CP-1252 in Eclipse. — Franck Dernoncourt, Apr 11 '18 at 06:09
And what happens if you try the following on the command line? https://stackoverflow.com/a/18439832/6505250 — howlger, Apr 11 '18 at 07:04
@howlger Thanks. Using `cmd /K chcp 65001` does prevent the `UnicodeEncodeError` from happening, but the displayed text isn't pretty: https://i.stack.imgur.com/u8TgJ.png — Franck Dernoncourt, Apr 11 '18 at 17:45
If opening a new window via `start cmd /K chcp 65001` does not help, maybe you have also [set the source file's encoding via `# -*- coding: utf-8 -*-` (see this answer)](https://stackoverflow.com/a/6179672/6505250). Otherwise, I don't know what else to do. To answer your question, why it is working in Eclipse: the encoding for the Console view is set in the run configuration in the tab _Common_ which I guess it set to `UTF-8` in your case. — howlger, Apr 12 '18 at 11:49
@howlger Thanks, adding `# -*- coding: utf-8 -*-` didn't help. Where is the the tab Common? — Franck Dernoncourt, Apr 12 '18 at 16:55
You will find the _Common_ tab in the run configuration: _Run > Run Configurations..._ http://www.pydev.org/manual_101_run.html — howlger, Apr 13 '18 at 12:06
@howlger Thank you, you're correct: the encoding indicated in the Common tab changed from CP-1252 to UTF-8. Why does that change the behavior of the python interpreter? — Franck Dernoncourt, Apr 16 '18 at 05:47

score 1 · Accepted Answer · answered Apr 16 '18 at 07:12

1

Phyton converts the text to be printed to the encoding of the console which is the active code page on Windows (at least until version 3.6).

To avoid the UnicodeEncodeError you have to change the console encoding to UTF-8. There are several ways to do this, e. g. on the Windows command line by executing cmd /K chcp 65001.

In Eclipse, the encoding of the console can be set to UTF-8 in the run configuration (Run > Run Configurations...), in the Common tab.

The text file encoding settings in Window > Preferences: General > Workspace and in Project > Properties: Ressource are only used by text editors how to display text files.

answered Apr 16 '18 at 07:12

howlger

31,050
11
59
99

Thanks a lot for the explanation. – Franck Dernoncourt Apr 16 '18 at 07:21
From [UTF-8 in Windows 7 CMD](https://stackoverflow.com/a/22340018/395857): "you need to use Lucida console fonts in addition to executing chcp 65001 from cmd console." – Franck Dernoncourt Aug 09 '18 at 20:50

Why does "Save as UTF-8" in Eclipse fix the Python UnicodeEncodeError?

1 Answers1