Why does printing a png file in Python3 result in extra characters in IDAT chunk?

Question

So I've been testing out reading uncompressed png files with the following code in Python3:

f = open(r'img1.png', 'rb')
pixel = f.read()
print(pixel)

However the results give some strange additional characters besides the hex pairs I would expect in the IDAT chunk:

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x02\x00\x00\x00\x02\x08\x02\x00\x00\x00\xfd\xd4\x9as\x00\x00\x00\x19IDAT\x08\x1d\x01\x0e\x00\xf1\xff\x00\x00\x00\x00\x00\xff\xff\x01\x00\xff\xff\x13\x90\x90\x1b\xe4 \x0510O\xffC \x00\x00\x00\x00IEND\xaeB`\x82' [Finished in 0.1s]

Any idea what this is? I was under the assumption that everything in IDAT when the data was uncompressed was pixel data in hex pairs. I've searched both StackOverflow/Online as well as looked through the documentation for PNG without any luck.

Here is a link to the image I am using to test (it's only 4 pixels): img1.png

FYI I'm running tests via ArchLinux if that helps.

The image might be malformed. Have you seen similar patterns in other image files? — Simeon Visser, Dec 05 '14 at 19:05
Just double-checking - you're aware that `\x0510O` is the sequence of bytes `0x05 0x31 0x30 0x4f` rather than some weird multibyte sequence? — senshin, Dec 05 '14 at 19:06
Don't forget that every scanline in a PNG also starts with an extra byte to identify the pre-filter used for that line. Just one extra byte per scanline, regardless of how many bytes per pixel. — Lee Daniel Crocker, Dec 05 '14 at 19:24
Yep, @senshin that's exactly the piece of information I was missing. Thanks! — Chris Gill, Dec 05 '14 at 20:24

score 1 · Accepted Answer · edited May 23 '17 at 11:50

I suspect senshin's comment is on the spot.

When printing binary data, Python just prints the printable ASCII characters a such, and the remainder bytes as \xHH where H is an hexadecimal digit.

Hence the subsequence \xe4\x0510O\xffC\x00 is just the sequence of the following eight bytes:

\xe4
\x05
1     ----> ASCII character '1', equivalent to \x31
0     ----> etc
O
\xff
C
\x00

Oc course, in this case we'd prefer that all bytes were printed as hexadecimal, because all those bytes are intended to be understood as just binary data, not as text characters. But Python cannot guess that.

If you want to print all the bytes as hexadecimal, you can see some recipes here. Be aware, however, that in some cases the ASCII output is what you prefer (the 'PNG' sequence at the start, and the Chunks identifiers...), so there is no silver bullet.

Thank you @leonbloy for the explanation and links; that helped quite a bit. Now I can figure out how to work with this from there. — Chris Gill, Dec 05 '14 at 20:25

Why does printing a png file in Python3 result in extra characters in IDAT chunk?

1 Answers1