Mojibake indicates that the text encoded in one encoding is shown in another incompatible encoding:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
print(u"╔╤╤╦╤╤╦╤╤╗".encode('utf-8').decode('cp1252')) #XXX: DON'T DO IT
# -> ╔╤╤╦╤╤╦╤╤╗
There are several places where the wrong encoding could be used.
# coding: utf-8
encoding declaration says how non-ascii characters in your source code (e.g., inside string literals) should be interpreted. If print u"╔╤╤╦╤╤╦╤╤╗"
works in your case then it means that the source code itself is decoded to Unicode correctly. For debugging, you could write the string using only ascii characters: u'\u2554\u2557' == u'╔╗'
.
print "╔╤╤╦╤╤╦╤╤╗"
(DON'T DO IT) prints bytes (text encoded using utf-8 in this case) as is. IDLE itself works with Unicode (BMP). The bytes must be decoded into Unicode text before they can be shown in IDLE. It seems IDLE uses ANSI code page such as cp1252
(locale.getpreferredencoding(False)
) to decode the output bytes on Windows. Don't print text as bytes. It will fail in any environment that uses a character encoding different from your source code e.g., you would get ΓòöΓòù...
mojibake if you run the code from the question in Windows console that uses cp437 OEM code page.
You should use Unicode for all text in your program. Python 3 even forbids non-ascii characters inside a bytes
literal. You would get SyntaxError
there.
print(u'\u2554\u2557')
might fail with UnicodeEncodeError
if you would run the code in Windows console and OEM code page such as cp437 weren't be able to represent the characters. To print arbitrary Unicode characters in Windows console, use win-unicode-console
package. You don't need it if you use IDLE.