1

There is an existing question How to write binary data to stdout in python 3? but all of the answers suggest sys.stdout.buffer or variants thereof (e.g., manually rewrapping the file descriptor), which have a problem: they don't respect buffering:

MacBook-Pro-116:~ ezyang$ cat test.py
import sys
sys.stdout.write("A")
sys.stdout.buffer.write(b"B")
MacBook-Pro-116:~ ezyang$ python3 test.py | cat
BA

Is there a way to write binary data to stdout while respecting buffering with respect to sys.stdout and unadorned print statements? (The actual use-case is, I have "text-like" data of an unknown encoding and I just want to pass it straight to stdout without making a commitment to a particular encoding.)

Edward Z. Yang
  • 26,325
  • 16
  • 80
  • 110
  • If your data is in `bytes` you can just use `sys.stdout.buffer.write`. But if you want to use the `print` function, Python 3 will assume your input is `str` and convert it into `bytes` before writing it to stdout -- and that requires your specifying an encoding so it can do the conversion. I'm not sure I'm understanding your question -- are you trying to use both `print` and `sys.stdout.buffer.write` to write the same type of data (`bytes`)? or are you trying to mix and match? – kchan Apr 15 '19 at 05:38
  • Look at the code example above: although I write A and then write B, when I run the script they come out of order. I want a version of `sys.stdout.buffer.write` which gets ordered correctly with respect to `sys.stdout.write`. – Edward Z. Yang Apr 15 '19 at 14:19
  • See the answer below. Is that what you're looking for? – kchan Apr 15 '19 at 16:18

2 Answers2

3

Can't you interleave calls to write with flush ?

sys.stdout.write("A")

sys.stdout.buffer.write(b"B")

Results in:

BA


sys.stdout.write("A")
sys.stdout.flush()

sys.stdout.buffer.write(b"B")
sys.stdout.flush()

Results in:

AB

sleblanc
  • 3,821
  • 1
  • 34
  • 42
1

You can define a local function called _print (or even override the system print function by naming it print) as follows:

import sys

def _print(data):
    """
    If data is bytes, write to stdout using sys.stdout.buffer.write,
    otherwise, assume it's str and convert to bytes with utf-8
    encoding before writing.
    """
    if type(data) != bytes:
        data = bytes(data, 'utf-8')
    sys.stdout.buffer.write(data)

_print('A')
_print(b'B')

The output should be AB.

Note: normally the system print function adds a newline to the output. The above _print just outputs the data (either bytes or by assuming it's str) without the newline.

buffered implementation

If you want buffered I/O, you can manage that by using the tools from the io library.

Simple example:

import io
import sys

output_buffer = None
text_wrapper = None

def init_buffer():
    global output_buffer, text_wrapper
    if not output_buffer:
        output_buffer = io.BytesIO()
        text_wrapper = io.TextIOWrapper(
            output_buffer,
            encoding='utf-8',
            write_through=True)

def write(data):
    if type(data) == bytes:
        output_buffer.write(data)
    else:
        text_wrapper.write(data)

def flush():
    sys.stdout.buffer.write(output_buffer.getvalue())

# initialize buffer, write some data, and then flush to stdout
init_buffer()
write("A")
write(b"B")
write("foo")
write(b"bar")
flush()

If you are performing all the output writes in a function, for example, you can use the contextlib.contextmanager to create a factory function that allow you to use the with ... statement:

# This uses the vars and functions in the example above.

import contextlib

@contextlib.contextmanager
def buffered_stdout():
    """
    Create a factory function for using the `with` statement
    to write to the output buffer.
    """
    global output_buffer
    init_buffer()
    fh = sys.stdout.buffer
    try:
        yield fh
    finally:
        try:
            fh.write(output_buffer.getvalue())
        except AttributeError:
            pass


# open the buffered output stream and write some data to it
with buffered_stdout():
    write("A")
    write(b"B")
    write("foo")
    write(b"bar")

See:

kchan
  • 836
  • 8
  • 13
  • This is not a good answer. One fundamental problem with interacting with `sys.stdout.buffer` directly is that it disables buffering entirely; this is bad for IO performance. – Edward Z. Yang Apr 15 '19 at 19:25
  • If you edit your question to include more specific details and context on what you're looking for and the kind and scope of the data you're working with, you might get better answers. I've added a couple of "buffered" implementation to the example. See if that helps. – kchan Apr 15 '19 at 22:11