21

Here is a little tmp.py with a non ASCII character:

if __name__ == "__main__":
    s = 'ß'
    print(s)

Running it I get the following error:

Traceback (most recent call last):
  File ".\tmp.py", line 3, in <module>
    print(s)
  File "C:\Python32\lib\encodings\cp866.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xdf' in position 0: character maps to <undefined>

The Python docs says:

By default, Python source files are treated as encoded in UTF-8...

My way of checking the encoding is to use Firefox (maybe someone would suggest something more obvious). I open tmp.py in Firefox and if I select View->Character Encoding->Unicode (UTF-8) it looks ok, that is the way it looks above in this question (wth ß symbol).

If I put:

# -*- encoding: utf-8 -*-

as the first string in tmp.py it does not change anything—the error persists.

Could someone help me to figure out what am I doing wrong?

dreftymac
  • 31,404
  • 26
  • 119
  • 182
Anton Daneyko
  • 6,528
  • 5
  • 31
  • 59
  • 1
    @Blender: `u` doesn't do anything in Python 3 (and in earlier versions of Python 3 was an error until it was added back for backwards compatibility) – Wooble Jan 11 '13 at 18:22
  • More likely you have a problem setting the encoding in your editor. – LtWorf Jan 11 '13 at 18:25
  • 2
    Also it says encoding error.. not decoding error. Since cp866 is ms-dos code page, I think you are trying to print it to the console, which requires encoding. – Esailija Jan 11 '13 at 18:25
  • @Wooble Can you explain to me how does one can determine that? If I do my Firefox trick and choose ISO 8859-1 I see `s = 'ß'` instead of s = 'ß'. – Anton Daneyko Jan 11 '13 at 18:27
  • @mezhaka: I'm wrong, Martijn has the correct explanation. – Wooble Jan 11 '13 at 18:29
  • related: [Python, Unicode, and the Windows console](http://stackoverflow.com/q/5419/4279) – jfs Mar 01 '16 at 12:57

1 Answers1

36

The encoding your terminal is using doesn't support that character:

>>> '\xdf'.encode('cp866')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/encodings/cp866.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\xdf' in position 0: character maps to <undefined>

Python is handling it just fine, it's your output encoding that cannot handle it.

You can try using chcp 65001 in the Windows console to switch your codepage; chcp is a windows command line command to change code pages.

Mine, on OS X (using UTF-8) can handle it just fine:

>>> print('\xdf')
ß
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    He should be fine in windows if he does `chcp 65001` before he runs the program, assuming python detects that – Esailija Jan 11 '13 at 18:29
  • @Esailija: I've had feedback that that doesn't always work. I think fonts need switching too. – Martijn Pieters Jan 11 '13 at 18:30
  • For a `ß`, probably not. But maybe for more exotic characters the default windows cmd prompt font probably won't do :P – Esailija Jan 11 '13 at 18:31
  • 6
    You're right: it is the terminal thing. If I do `with open('tmp.txt', 'w', encoding='utf-8') as f: f.write(s)` it works fine. Can you elaborate on "try using chcp 65001" — that does not say anything to me. – Anton Daneyko Jan 11 '13 at 18:38
  • @mezhaka you can fix the terminal too, I just installed python 3 and tested that `chcp 65001` works. Run `chcp 65001` in your terminal before running the python file. – Esailija Jan 11 '13 at 18:41
  • @mezhaka: `chcp 65001` is a Windows command to change the code page (encoding) being used in the command-line window. If you issue it before starting Python 3 it will carry over to the Python console. Doing this with Python 2.7.3 will result in an error. – martineau Jan 11 '13 at 20:05
  • @mezhaka:E] I expanded the sentence a little, it was indeed not very clear. – Martijn Pieters Jan 11 '13 at 20:58
  • @Esailija I tried to run chcp 65001 before running the script. It now gives me no error, but still the non ASCII characters are either not printed or the wrong symbols are printed. But I am fine with that, I'll just write what I need directly to a file. (Btw. if I redirect the the output via > I get the that encoding error again.) – Anton Daneyko Jan 12 '13 at 17:17
  • @mezhaka: yes, redirecting to a file means there is no encoding set for printing (writing to `sys.stdout`). Encode manually in that case. And your terminal font doesn't support the characters you are trying to print, so they are not displayed correctly. – Martijn Pieters Jan 12 '13 at 17:18
  • @MartijnPieters Indeed, I've changed the font to Lucida Console and I see the ß! – Anton Daneyko Jan 12 '13 at 17:23
  • the correct solution is to [leave `chcp` alone and use Unicode API on Windows](http://stackoverflow.com/a/32176732/4279) – jfs Mar 01 '16 at 12:58
  • @J.F.Sebastian: I agree; the number of questions about Windows console printing I've duped to that post is rather long now. – Martijn Pieters Mar 01 '16 at 12:59