6

Im trying to print a smiley in Python: ☺

It works without any problems in the interactive shell (inside cmd.exe)

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AM
D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("☺")
☺

But if I try the same thing out of an file I get this error:

Traceback (most recent call last):
  File "main.py", line 8, in <module>
    print("\u263a")
  File "C:\dev\lang\Python34\lib\encodings\cp850.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u263a' in position
0: character maps to <undefined>

The Python-File is UTF-8 encoded.


Update:

Even if there isn't a real answer to my problem yet, it's worth to read the comments under the question. I also created a list of all printable characters with the default raster font of the cmd.exe (tested on Windows 10). To print a char simply use the chr() function. For example chr(14) gives you

0       [space]
1       ☺
2       ☻
3       ♥
4       ♦
5       ♣
6       ♠
7       [nothing]
8       [backspace, removes char before]
9       [tabulator]
10      [newline]
11      ♂
12      ♀
13      [takes part after chr(13) and replaces begin of string with it]
14      ♫
15      ☼
16      ►
17      ◄
18      ↕
19      ‼
20      ¶
21      §
22      ▬
23      ↨
24      ↑
25      ↓
26      →
27      ←
28      ∟
29      ↔
30      ▲
31      ▼
32      [space]
33      !
34      "
35      #
36      $
37      %
38      &
39      '
40      (
41      )
42      *
43      +
44      ,
45      -
46      .
47      /
48      0
49      1
50      2
51      3
52      4
53      5
54      6
55      7
56      8
57      9
58      :
59      ;
60      <
61      =
62      >
63      ?
64      @
65      A
66      B
67      C
68      D
69      E
70      F
71      G
72      H
73      I
74      J
75      K
76      L
77      M
78      N
79      O
80      P
81      Q
82      R
83      S
84      T
85      U
86      V
87      W
88      X
89      Y
90      Z
91      [
92      \
93      ]
94      ^
95      _
96      `
97      a
98      b
99      c
100     d
101     e
102     f
103     g
104     h
105     i
106     j
107     k
108     l
109     m
110     n
111     o
112     p
113     q
114     r
115     s
116     t
117     u
118     v
119     w
120     x
121     y
122     z
123     {
124     |
125     }
126     ~
127     ⌂
160     [space]
161     ¡
162     ¢
163     £
164     ¤
165     ¥
166     ¦
167     §
168     ¨
169     ©
170     ª
171     «
172     ¬
173     ­[shorter -, can't be displayed outside of console]
174     ®
175     ¯
176     °
177     ±
178     ²
179     ³
180     ´
181     µ
182     ¶
183     ·
184     ¸
185     ¹
186     º
187     »
188     ¼
189     ½
190     ¾
191     ¿
192     À
193     Á
194     Â
195     Ã
196     Ä
197     Å
198     Æ
199     Ç
200     È
201     É
202     Ê
203     Ë
204     Ì
205     Í
206     Î
207     Ï
208     Ð
209     Ñ
210     Ò
211     Ó
212     Ô
213     Õ
214     Ö
215     ×
216     Ø
217     Ù
218     Ú
219     Û
220     Ü
221     Ý
222     Þ
223     ß
224     à
225     á
226     â
227     ã
228     ä
229     å
230     æ
231     ç
232     è
233     é
234     ê
235     ë
236     ì
237     í
238     î
239     ï
240     ð
241     ñ
242     ò
243     ó
244     ô
245     õ
246     ö
247     ÷
248     ø
249     ù
250     ú
251     û
252     ü
253     ý
254     þ
255     ÿ
305     ı
402     ƒ
8215    ‗
9472    ─
9474    │
9484    ┌
9488    ┐
9492    └
9496    ┘
9500    ├
9508    ┤
9516    ┬
9524    ┴
9532    ┼
9552    ═
9553    ║
9556    ╔
9559    ╗
9562    ╚
9565    ╝
9568    ╠
9571    ╣
9574    ╦
9577    ╩
9580    ╬
9600    ▀
9604    ▄
9608    █
9617    ░
9618    ▒
9619    ▓
9632    ■
Community
  • 1
  • 1
Daveman
  • 1,075
  • 9
  • 26
  • Over the last couple weeks, I, too, have been struggling with `UnicodeEncodeError`, trying to print Unicode characters, Unicode characters accessible on Windows with a simple alt-code, and normal characters properly on the cmd console, when redirecting from cmd, on the Powershell console, and when redirecting from Powershell. In a nutshell, every program, input method, and output method uses a different encoding and a different code page and different line endings. It's a mess. I ended up with a bunch of `try..except`s and a docstring with a workaround. – TigerhawkT3 Aug 20 '15 at 19:24
  • However, I can verify that a native Unicode (UTF-8) terminal or terminal emulator, like the Linux KDE Konsole, processes Unicode characters with no issues. – TigerhawkT3 Aug 20 '15 at 20:09
  • I got your example to work using this [answer](http://stackoverflow.com/a/17177904/3900879). I changed my `cmd` font to `Lucida Console` and used `chcp 65001` (code page [65001](https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx) is UTF-8). – Rusty Shackleford Aug 20 '15 at 20:09
  • That's still a "docstring with a workaround" thing instead of a "just works" thing, but if the OP only intends this program for personal use, and if he's fine with changing cmd's font and then changing the code page each time he starts up a new prompt, that could be enough. OP, what is the intended audience for this program? – TigerhawkT3 Aug 20 '15 at 20:23
  • It also is very strange that the default Raster Font displays `☺` (copy paste, it realy looks like `ôÿ║` ) even if it clearly can display the ☺ icon. Do anybody know whats the problem there? And is there some kind of command to change the font out of the shell? – Daveman Aug 20 '15 at 21:00
  • @Dounut: The raster fonts are not Unicode-enabled. Therefore, it is displaying the individual bytes from the UTF-8 encoded string. `\u263a` is encoded as `\xE2 \x98 \xBA` in UTF-8, which yields different characters depending on which code page you have active (code page 437 is the default on my en_US Windows 8 box). – Rusty Shackleford Aug 20 '15 at 21:09
  • @Dounut: Regarding changing the font out of the shell... Take a look at the `HKCU\Console\%SystemRoot%_system32_cmd.exe` key (there's probably an HKLM equivalent). The `FaceName` value is used to specify the active `cmd` font. Therefore, any mechanism you have available to modify the registry could be used to change the font. – Rusty Shackleford Aug 20 '15 at 21:24
  • You can also change cmd's font by clicking the icon at the left of the title bar (or right-click the title bar) and select Properties. However, even when changing the font and code page, I was only able to successfully print alt-code-friendly characters like ☺. Others, like ♞ (Unicode 265E, black knight chess piece), display only a box. – TigerhawkT3 Aug 20 '15 at 21:55
  • @TigerhawkT3: Confirmed using both `Lucida Console` and `Consolas`. Apparently, `Consolas` is not a complete Unicode font and doesn't support many code points (see [here](http://www.fileformat.info/info/unicode/font/consolas/grid.htm)). One would probably have to install a third-party monospace font that supports the entire set. For example, [Wikipedia](https://en.wikipedia.org/wiki/Unicode_font#2580-2DFF) reports only one font (`Everson Mono`) that supports the full _Miscellaneous Symbols_ range (`2600–267F`). – Rusty Shackleford Aug 20 '15 at 22:12
  • Overall, I think I'd recommend using `:)` instead of `☺` if you're okay with doing so. – TigerhawkT3 Aug 20 '15 at 23:46

1 Answers1

1

When you redirect to a file Python doesn't know what encoding to use. Redirecting to a file is a shell operation, and Python understands a shell variable that indicates the encoding to use. Set the following environment variable before redirecting to a file:

PYTHONIOENCODING=utf8
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • So I entered `SET PYTHONIOENCODING=utf8` into the commandline.(right?) Unfortunately it didn't changed anything... Isn't UTF-8 the default encoding? – Daveman Aug 22 '15 at 07:55
  • Did you then redirect the output of your script to a file? The default console encoding is not UTF-8. As you can see from your own error message, Python uses the current console code page which on your system was `cp850`. – Mark Tolonen Aug 25 '15 at 05:13