35

Is there any way to write binary output to sys.stdout in Python 2.x? In Python 3.x, you can just use sys.stdout.buffer (or detach stdout, etc...), but I haven't been able to find any solutions for Python 2.5/2.6.

EDIT: I'm trying to push a PDF file (in binary form) to stdout for serving up on a web server. When I try to write the file using sys.stdout.write, it adds all sorts of carriage returns to the binary stream that causes the PDF to render corrupt.

EDIT 2: For this project, I need to run on a Windows Server, unfortunately, so Linux solutions are out.

Simply Dummy Example (reading from a file on disk, instead of generating on the fly, just so we know that the generation code isn't the issue):

file = open('C:\\test.pdf','rb') 
pdfFile = file.read() 
sys.stdout.write(pdfFile)
vvvvv
  • 25,404
  • 19
  • 49
  • 81
Eavesdown
  • 353
  • 1
  • 3
  • 6
  • When you did `sys.stdout.write()` what didn't work? – S.Lott Mar 03 '10 at 19:48
  • See above for explanation, but the issue is basically that python adds carriage returns when it tries to convert the binary stream to a string for writing. – Eavesdown Mar 03 '10 at 19:54
  • Does `sys.stdout = os.fdopen(1, "wb")` work for you to eliminate text-mode conversions? (You'll still need to use sys.stdout.write if you don't want the NLs from print statements.) (http://docs.python.org/library/os.html#os.fdopen) –  Mar 03 '10 at 20:15
  • Thanks for the great question. I learned something new today. – Jason R. Coombs Mar 03 '10 at 20:31
  • @Roger, surprisingly `os.fdopen` doesn't solve it, although running python with the `-u` works. `-u` does bring extra overhead though – John La Rooy Mar 03 '10 at 20:54
  • Maybe you want to check out the [link](http://code.activestate.com/recipes/65443-sending-binary-data-to-stdout-under-windows/) again, I added another answer. A wrapper for the stdout using `os.write()` and `os.read()` seems to be working fine in my test cases. – Niklas R Nov 26 '14 at 13:37
  • Good question; I had the same issue when I wanted to serve a PNG file from a Python script under Windows Apache. – Dan H Nov 26 '14 at 19:01

5 Answers5

29

Which platform are you on?

You could try this recipe if you're on Windows (the link suggests it's Windows specific anyway).

if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

There are some references on the web that there would/should be a function in Python 3.1 to reopen sys.stdout in binary mode but I don't really know if there's a better alternative then the above for Python 2.x.

Iain Samuel McLean Elder
  • 19,791
  • 12
  • 64
  • 80
ChristopheD
  • 112,638
  • 29
  • 165
  • 179
  • I did a test just reading the PDF in from a file and writing it straight back out, the carriage returns are still added. – Eavesdown Mar 03 '10 at 20:07
  • The windows solution link you give is the perfect solution. I can't thank you enough; this was driving me absolutely up the wall. – Eavesdown Mar 03 '10 at 20:30
  • Great! The same works for [`stdin`](http://stackoverflow.com/a/28673339/321973) as well, and both is required to make e.g. a functional `cat` clone that can handle binary files – Tobias Kienzler Feb 23 '15 at 12:03
9

You can use unbuffered mode: python -u script.py.

-u     Force  stdin,  stdout  and stderr to be totally unbuffered.
       On systems where it matters, also put stdin, stdout and stderr
       in binary mode.
Tim Delaney
  • 5,535
  • 3
  • 24
  • 18
8

You can use argopen.argopen(), it handles dash as stdin/stdout, and fixes binary mode on Windows.

import argopen
stdout = argopen.argopen('-', 'wb')
stdout.write(some_binary_data)
Iain Samuel McLean Elder
  • 19,791
  • 12
  • 64
  • 80
inv
  • 4,759
  • 1
  • 17
  • 10
7

In Python 2.x, all strings are binary character arrays by default, so I believe you should be able to just

>>> sys.stdout.write(data)

EDIT: I've confirmed your experience.

I created one file, gen_bytes.py

import sys
for char in range(256):
    sys.stdout.write(chr(char))

And another read_bytes.py

import subprocess
import sys

proc = subprocess.Popen([sys.executable, 'gen_bytes.py'], stdout=subprocess.PIPE)
res = proc.wait()
bytes = proc.stdout.read()
if not len(bytes) == 256:
    print 'Received incorrect number of bytes: {0}'.format(len(bytes))
    raise SystemExit(1)
if not map(ord, bytes) == range(256):
    print 'Received incorrect bytes: {0}'.format(map(ord, bytes))
    raise SystemExit(2)
print "Everything checks out"

Put them in the same directory and run read_bytes.py. Sure enough, it appears as if Python is in fact converting newlines on output. I suspect this only happens on a Windows OS.

> .\read_bytes.py
Received incorrect number of bytes: 257

Following the lead by ChristopheD, and changing gen_bytes to the following corrects the issue.

import sys

if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

for char in range(256):
    sys.stdout.write(chr(char))

I include this for completeness. ChristopheD deserves the credit.

Jason R. Coombs
  • 41,115
  • 10
  • 83
  • 93
  • This works if you're only trying to add string data, but python tries to stringify binary data when just calling write, corrupting the data. – Eavesdown Mar 03 '10 at 19:54
  • I ran your `gen_bytes.py` and `read_bytes.py` on Mac OS X (Python 2.5 with minor modifications for the missing "format" keywords) and it "Everything checks out" – Doug Harris Mar 03 '10 at 20:16
  • It looks like it's a Windows-only issue. – Eavesdown Mar 03 '10 at 20:20
  • On windows, I found that just running `gen_bytes.py > bytes.bin` I could see that the file was 257 bytes simply by doing a `dir` – John La Rooy Mar 03 '10 at 21:11
  • Unless you're using powershell, in which case `gen_bytes.py > bytes.bin` generates a unicode-encoded file of 522 bytes. – Jason R. Coombs Mar 04 '10 at 13:16
  • If I reverse the two processes, such that the parent writes and child reads, then I have to set sys.stdin to be binary, *on the child*. Perhaps the PIPEs that subprocess sets up are always binary, but stdin/stdout are not? – Devin Lane May 15 '16 at 19:47
0

I solved this using a wrapper for a file-descriptor. (Tested in Python 3.2.5 on Cygwin)

class BinaryFile(object):
    ''' Wraps a file-descriptor to binary read/write. The wrapped
    file can not be closed by an instance of this class, it must
    happen through the original file.

    :param fd: A file-descriptor (integer) or file-object that
        supports the ``fileno()`` method. '''

    def __init__(self, fd):
        super(BinaryFile, self).__init__()
        fp = None
        if not isinstance(fd, int):
            fp = fd
            fd = fp.fileno()
        self.fd = fd
        self.fp = fp

    def fileno(self):
        return self.fd

    def tell(self):
        if self.fp and hasattr(self.fp, 'tell'):
            return self.fp.tell()
        else:
            raise io.UnsupportedOperation(
                'can not tell position from file-descriptor')

    def seek(self, pos, how=os.SEEK_SET):
        try:
            return os.lseek(self.fd, pos, how)
        except OSError as exc:
            raise io.UnsupportedOperation('file-descriptor is not seekable')

    def write(self, data):
        if not isinstance(data, bytes):
            raise TypeError('must be bytes, got %s' % type(data).__name__)
        return os.write(self.fd, data)

    def read(self, length=None):
        if length is not None:
            return os.read(self.fd, length)
        else:
            result = b''
            while True:
                data = self.read(1024)
                if not data:
                    break
                result += data
            return result
Niklas R
  • 16,299
  • 28
  • 108
  • 203
  • The code in this answer doesn't solve the problem in Python 2.7: the `\r` bytes still appear on standard output on Windows. By adding `msvcrt.setmode(self.fd, os.O_BINARY)` (as indicated in other answers), the `\r` bytes disappear. – pts Nov 20 '19 at 10:33