How to make python 3 print() utf8

Question

How can I make python 3 (3.1) print("Some text") to stdout in UTF-8, or how to output raw bytes?

Test.py

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this is UTF-8
TestText2 = b"Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd" # just bytes
print(sys.getdefaultencoding())
print(sys.stdout.encoding)
print(TestText)
print(TestText.encode("utf8"))
print(TestText.encode("cp1252","replace"))
print(TestText2)

Output (in CP1257 and I replaced chars to byte values [x00]):

utf-8
cp1257
Test - [xE2][xC2][xE7][C7][xE8][xC8]..[xF0][xD0][xFB][xDB][xFE][xDE]  
b'Test - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'
b'Test - ??????..\x9a\x8a??\x9e\x8e'
b'Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'

print is just too smart... :D There's no point using encoded text with print (since it always show only representation of bytes not real bytes) and it's impossible to output bytes at all, because print anyway and always encodes it in sys.stdout.encoding.

For example: print(chr(255)) throws an error:

Traceback (most recent call last):
  File "Test.py", line 1, in <module>
    print(chr(255));
  File "H:\Python31\lib\encodings\cp1257.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xff' in position 0: character maps to <undefined>

By the way print( TestText == TestText2.decode("utf8")) returns False, although print output is the same.

How does Python 3 determine sys.stdout.encoding and how can I change it?

I made a printRAW() function which works fine (actually it encodes output to UTF-8, so really it's not raw...):

 def printRAW(*Text):
     RAWOut = open(1, 'w', encoding='utf8', closefd=False)
     print(*Text, file=RAWOut)
     RAWOut.flush()
     RAWOut.close()

 printRAW("Cool", TestText)

Output (now it print in UTF-8):

Cool Test - āĀēĒčČ..šŠūŪžŽ

printRAW(chr(252)) also nicely prints ü (in UTF-8, [xC3][xBC]) and without errors :)

Now I'm looking for maybe better solution if there's any...

check [this](http://stackoverflow.com/q/39528462/5284370) out too. — Soorena, Sep 18 '16 at 22:38
TestText starts with "Test" and TestText2 starts with "Test2" so they wouldn't compare equal :D — Philippe Carphin, Jun 03 '22 at 18:26

Mark Tolonen · Accepted Answer · 2020-01-25T22:25:47.803

68

Clarification:

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this not UTF-8...it is a Unicode string in Python 3.X.
TestText2 = TestText.encode('utf8') # this is a UTF-8-encoded byte string.

To send UTF-8 to stdout regardless of the console's encoding, use the its buffer interface, which accepts bytes:

import sys
sys.stdout.buffer.write(TestText2)

edited Jan 25 '20 at 22:25

answered Aug 30 '10 at 18:31

Mark Tolonen

166,664
26
169
251

thanks :) by the way when I said: "Test - āĀēĒčČ..šŠūŪžŽ" # this is UTF-8 I mean that string is written in UTF-8 with IDE, py file is encoded UTF-8 and when python parses file it converts string to Python unicode... – davispuh Aug 31 '10 at 13:15
1

i get: Traceback (most recent call last): File "", line 1, in AttributeError: '_ReplOutput' object has no attribute 'buffer' – o17t H1H' S'k Nov 17 '12 at 13:41
Python 3? We're you using an IDE? _ReplOutput sounds like stdout was replaced with an (incorrect) file-like object. – Mark Tolonen Nov 17 '12 at 15:21
(ok, despite struggling I can't post multiline error msg here) Hmm... >>> sys.stdout.buffer().write(chr(255)) Traceback (most recent call last): File "", line 1, in TypeError: '_io.BufferedWriter' object is not callable >>> sys.stdout.buffer.write(chr(252)) Traceback (most recent call last): File "", line 1, in TypeError: 'str' does not support the buffer interface Python 3.2.2 – Van Jone Jan 16 '13 at 16:13
@VanJone, post a new question. – Mark Tolonen Jan 16 '13 at 16:21
@Mark, Probably I didn't make it clear enough in my last comment that the answer doesn't always work, like in my case, so posting this error message. I'm not asking any questions at all there. – Van Jone Jan 17 '13 at 14:32
@VanJone, you have errors in both statements. `buffer` is an attribute not a function so don't call it, and `chr()` returns a Unicode string and `buffer.write` takes byte strings. – Mark Tolonen Jan 17 '13 at 15:46
Oh my... I posted wrong error message... Of course I tried sys.stdout.buffer.write first but it failed too. The error was >>> import sys >>> sys.stdout.buffer.write(chr(252)) Traceback (most recent call last): File "", line 1, in TypeError: 'str' does not support the buffer interface – Van Jone Jan 18 '13 at 11:18
And yes, it works for byte strings, so your second reason is 100% true – Van Jone Jan 18 '13 at 11:19

score 16 · Answer 2 · answered Aug 30 '10 at 04:20

16

This is the best I can dope out from the manual, and it's a bit of a dirty hack:

utf8stdout = open(1, 'w', encoding='utf-8', closefd=False) # fd 1 is stdout
print(whatever, file=utf8stdout)

It seems like file objects should have a method to change their encoding, but AFAICT there isn't one.

If you write to utf8stdout and then write to sys.stdout without calling utf8stdout.flush() first, or vice versa, bad things may happen.

answered Aug 30 '10 at 04:20

zwol

135,547
38
252
361

5

Had issue on windows, where `cp1257` was used for printing (and failed), while I wanted `utf-8`. Following snippet worked: `import sys; sys.stdout = open(1, 'w', encoding='utf-8', closefd=False); print("vadsэавфыаЭХÜÜÄ"); print(bytes("аЭХÜ", "utf-8"))` – iljau Oct 21 '15 at 18:40
@zwol and all: what is the rationale that the Python 3 `print` function was defined and designed not to handle Unicode? – Old Geezer Oct 18 '17 at 03:13
@OldGeezer That's not correct. It *was* defined and designed to handle Unicode. But the interpreter thinks, for some reason that we'll probably never know, that `sys.stdout` is feeding to a terminal emulator that *doesn't* handle Unicode, only CP1257, and therefore `print` (actually `sys.stdout.write`) must convert *from* Unicode *to* CP1257 before printing, and any character not in the CP1257 repertoire can't be printed at all (unless it is escaped first, which `print` won't do for you). – zwol Oct 18 '17 at 17:10

score 12 · Answer 3 · answered May 20 '21 at 19:27

12

As per this answer

You can manually reconfigure the encoding of stdout as of python 3.7

import sys
sys.stdout.reconfigure(encoding='utf-8')

answered May 20 '21 at 19:27

CervEd

3,306
28
25

Andreas Haferburg · Answer 4 · 2021-06-16T17:53:18.970

1

I tried zwol's solution in Python 3.6, but it didn't work for me. With some strings there was no output printed to the console.

But iljau's solution worked: Reopen stdout with a different encoding.

import sys
sys.stdout = open(1, 'w', encoding='utf-8', closefd=False)

edited Jun 16 '21 at 17:53

answered Jun 16 '21 at 17:47

Andreas Haferburg

5,189
3
37
63

score 0 · Answer 5 · answered Feb 25 '22 at 23:35

0

You can set the console encoding at utf-8 with:

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)

answered Feb 25 '22 at 23:35

jumorap

61
4

How to make python 3 print() utf8

Test.py

5 Answers5

Linked