15

How to get \n printed to stdout on Windows? This code works in Python 2, but not with Python 3:

# set sys.stdout to binary mode on Windows
import sys, os, msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

# the length of testfile created with
#     python test_py3k_lf_print.py > testfile
# below should be exactly 4 symbols (23 0A 23 0A)
print("#\n#")
anatoly techtonik
  • 19,847
  • 9
  • 124
  • 140

1 Answers1

15

Python 3 already configures standard I/O in binary mode, but it has its own I/O implementation that does newline translation. Instead of using print, which requires a text-mode file, you could manually call sys.stdout.buffer.write to use the binary-mode BufferedWriter. If you need to use print, then you'll need a new text I/O wrapper that doesn't use universal newlines. For example:

stdout = open(sys.__stdout__.fileno(), 
              mode=sys.__stdout__.mode, 
              buffering=1, 
              encoding=sys.__stdout__.encoding, 
              errors=sys.__stdout__.errors, 
              newline='\n', 
              closefd=False)

Since closefd is false, closing this file won't close the original sys.stdout file descriptor. You can use this file explicitly via print("#\n#", file=stdout), or replace sys.stdout = stdout. The original is available as sys.__stdout__.

Background

Python 3's io module was designed to provide a cross-platform and cross-implementation (CPython, PyPy, IronPython, Jython) specification for all filelike objects in terms of the abstract base classes RawIOBase, BufferedIOBase, and TextIOBase. It includes a reference pure Python implementation in the _pyio module. The common denominator for the raw io.FileIO implementation is the set of low-level POSIX system calls such as read and write, which eliminates the problem of CRT stdio inconsistencies. On Windows, the POSIX layer is just the low I/O layer of the CRT, but at least that's limited to the quirks of a single platform.

One of the Windows quirks is having non-standard text and binary modes in its POSIX I/O layer. Python addresses this by always using binary mode and calling setmode on the stdio file descriptors 1.

Python can avoid using the Windows CRT for I/O by implementing a WinFileIO registered subclass of RawIOBase. There's a proposed patch for this in issue 12939. Another example is the win_unicode_console module, which implements WindowsConsoleRawReader and WindowsConsoleRawWriter classes.


1. This has caused problems for programs that embed Python and expect stdio to use the default text mode. For example, in binary mode printing wide-character strings no longer casts to char as it does in ANSI text mode, and it certainly doesn't print using WriteConsoleW as it would in UTF-16 text mode. For example:

Python 2.7.10 (default, May 23 2015, 09:44:00) 
[MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, os, msvcrt, ctypes 
>>> ctypes.cdll.msvcr90.wprintf(b'w\x00i\x00d\x00e\x00\n\x00') 
wide
5
>>> msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY) 
16384
>>> ctypes.cdll.msvcr90.wprintf(b'w\x00i\x00d\x00e\x00\n\x00')
w i d e
 5

Eryk Sun
  • 33,190
  • 5
  • 92
  • 111
  • `grief for programs that embed Python and expect stdio to use the default text mode` - who is expecting that and why? For me it is a major headache on Windows, because it corrupts redirected binary streams. – anatoly techtonik Jan 27 '16 at 05:32
  • 1
    People printing `wchar_t` strings to the console expect text mode since the CRT at least casts them to `char` (good enough for ASCII), whereas in binary mode it just writes the raw wide characters, which `p r i n t s l i k e t h i s`. – Eryk Sun Jan 27 '16 at 06:04
  • I added a ctypes example that demonstrates the problem with calling the CRT's `wprintf` function in binary mode. – Eryk Sun Jan 27 '16 at 06:35
  • OMG. A whole can of worms. =) – anatoly techtonik Jan 27 '16 at 08:06
  • I'd like to see something like Drekin's `win_unicode_console` implemented in C and incorporated into Python 3.6 or at least 3.7. This would be used instead of `io.FileIO`, but only for console I/O. Standard I/O for a pipe, disk file, or non-console character device such as `\\.\NUL` would still use `io.FileIO`. Also, I'd like to see Python 4 divorce itself completely from the CRT I/O by integrating the patch from issue 12939. It's too limiting to force Windows into a POSIX box. They have different strengths and weaknesses. – Eryk Sun Jan 27 '16 at 08:16
  • I wish Python 4 was modular with ability to replace modules like console access with your own. For that it needs a user level API to system functions that is not based strictly on POSIX layer. But that needs engineering that it hard to do in distributed fashion unless everybody has a very good visualization skills and/or a lot of time. – anatoly techtonik Jan 28 '16 at 09:06
  • 2
    The automatic conversion of newlines can be a difficult beast to track down. I knew exactly what was going when my `print`s seemed to double newlines when the output was viewed inside of atom's [process-pallette](https://atom.io/packages/process-palette), but I didn't know *how* to disable universal newline conversion -- of course, my attempts at directly calling `sys.stdout.write()` also failed -- one step closer to the problem but still on the wrong end. Your code to redefine `sys.stdout` worked perfectly, thank you. – jedwards Apr 15 '16 at 08:44
  • Windows has its strengths but the inability to write a byte (`b'\n'`) to a pipe can't be described with polite words. You are not safe even if your *binary* data has no newline in it ( [`b'\r\n'` may appear out of thin air](http://stackoverflow.com/a/33959798/4279)). – jfs Jun 11 '16 at 11:43
  • @J.F.Sebastian, that's due to PowerShell's object pipeline. When run by PowerShell, the two instances of python.exe don't run at the same time and stdout of the first instance is not the same pipe as stdin of the second instance. PowerShell sits between them, in both space and time and does a funky text-mode transcode, and even appends a newline. The object pipeline is a fine idea in principle, but its default behavior for piping between native processes is a disaster. Just set up the pipeline directly using Python or even (buggy and archaic, but not completely insane on this point) cmd.exe. – Eryk Sun Jun 12 '16 at 09:05
  • @eryksun I understand: there is explicit "piped vs. no pipe" example in the link where "pipe" refers to the PowerShell pipe otherwise *both* cases use ordinary pipes (implicitly via `subprocess.check_output()`). – jfs Jun 12 '16 at 11:31
  • @J.F.Sebastian, then why generalize the behavior of PowerShell to all of Windows? PS doesn't implement anything like a traditional pipeline. It uses Windows pipes (from the NT NamedPipe filesystem, i.e. `\Device\NamedPipe` & `\FileSystem\Npfs`), but it sticks itself in between each channel as a man in the middle and corrupts binary data. AFAIK, the only text-mode processing that's implemented in the Windows API is that, when reading from the console, `ReadFile` handles Ctrl+Z at the start of a buffer as EOF (i.e. 0 bytes read). Otherwise, text mode is implemented in the CRT. – Eryk Sun Jun 12 '16 at 12:48
  • *"why generalize"*: the shell is how we interact with the system (the place where you type `a | b`) e.g., the shell command language is specified by POSIX (the keyboard is still the most efficient general-purpose interface for a power user). PowerShell is supposed to be a non-lobotomized version of the command-line. The command prompt is not the strong part in Windows. – jfs Jun 12 '16 at 13:28