file.write() and sys.stdout.write() are giving me two different outputs - Python

Question

The code below takes a JPEG image and converts it to a string. That string is then saved into the image variable. Then, the string is written to a.jpg using File IO and then written to b.jpg by me piping stdout to the file.

import thumb
import sys

x = thumb.Thumbnail('test.jpg')
x.generate(56, 56)

image = str(x)

with open('a.jpg', 'wb') as f:
    # saving to a.jpg
    f.write(image)

# saving to b.jpg
sys.stdout.write(image)

Usage:

python blah.py > b.jpg

This results in two image files (a.jpg and b.jpg). These images should be identical... But they aren't.

a.jpg
b.jpg

I can see, by looking at each image in Notepad, that linebreaks are, somehow, being added to b.jpg. Resulting in a corrupted image.

Why is a.jpg different to b.jpg?

sys.stdout.mode is 'w', I think. See, e.g., http://stackoverflow.com/questions/2374427/python-2-x-write-binary-output-to-stdout — DSM, Jan 25 '11 at 04:44
Your shell is probably interpreting your output when you redirect it through standard out. Are you on linux? Using bash? — Falmarri, Jan 25 '11 at 04:45
The unix tendency to use stdout to communicate between programs is a BAD idea for binary data. Pleas do not do it. Running your software without redirecting to a file will mess up the terminal, etc. Do **not** do it. Please! — Lennart Regebro, Jan 25 '11 at 06:30
@Lennart: It's actually writing to my web browser. It's a CGI script. I used stdout as an example as it had the same problem and was simpler to describe :} — dave, Jan 25 '11 at 07:34
Ah, I see. Yes, CGI is indeed an example of this very bad pattern in Unix. — Lennart Regebro, Jan 25 '11 at 08:16

miku · Answer 1 · 2011-01-25T05:13:57.720

You write your data to a.jpg as binary, while b.jpg get written in text mode. When in binary mode otherwise special characters (such as newlines or EOF marker) are not treated special, while in text mode they are.

In Python 3 you can switch modes:

http://docs.python.org/py3k/library/sys.html#sys.stdin

The standard streams are in text mode by default. To write or read binary data to these, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc').

Untested (Python 2):

import sys, os

binout = os.fdopen(sys.stdout.fileno(), 'wb')
binout.write(b'Binary#Data...')

Under pep8 imports are on separate lines – Jakob Bowyer Jan 25 '11 at 10:00 — Jakob Bowyer, Jan 25 '11 at 10:00

file.write() and sys.stdout.write() are giving me two different outputs - Python

1 Answers1