1

I am trying to print the byte d0 to stdout in Python 3.7.2. I have the following code:

print("\xd0", end = "")

However, when I execute this, it outputs the bytes c390.

$ python -c 'print("\xd0", end = "")' | xxd    
00000000: c3 90

Why is it not outputting the byte \xd0?

Aaron Esau
  • 1,083
  • 3
  • 15
  • 31

2 Answers2

2

"\xd0" is an str object, which in Python 3 is a Unicode string (= a sequence of Unicode code points) containing the Unicode code point U+00D0 (i.e. 208 i.e. Ð); when writing it with print, Python has to convert it from Unicode (str) to bytes (bytes), so it has to use an encoding (an "abstract codepoints" to bytes converter).

In your case, as often, it happens to be UTF-8, where codepoint U+00D0 is encoded as the code-units (= bytes) sequence c3 90.

If you want to output literally a byte 0xd0, you have to use a byte string and go straight to the bytes stream that is beyond sys.stdout:

import sys
sys.stdout.buffer.write(b'\xd0')
Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • Oh man, this makes so much sense. This issue cost me several hours of confusion other times where I didn't narrow it down to Python. Thank you very much. Is there something more general I can name this question since this doesn't only apply to the `d0` byte that you can think of? – Aaron Esau Mar 30 '19 at 00:38
  • @Arin: glad it helped! :-) As for the question title, I wouldn't know, maybe "single character (`\xd0`) gets printed as sequence of two characters (`\xc3\x90`)" or something like this? – Matteo Italia Mar 30 '19 at 01:08
1

It’s printing the character U+00D0 (“Д) as UTF-8. If you want to output a byte string to stdout, use sys.stdout.buffer:

import sys
sys.stdout.buffer.write(b"\xd0")

"\xd0" is a string of Unicode codepoints, but b"\xd0" is a string of bytes.

Ry-
  • 218,210
  • 55
  • 464
  • 476