2

In an attempt to create a simple cat clone in python,

sys.stdout.write(sys.stdin.read())

I noticed that this fails horribly for binary files (i.e. python cat.py < binaryfile > supposed_copy) containing the CTRL+Z EOF / substitude character 0x1a, since that seems to cause read() to consider its work done. I cannot simply loop over the code forever to circumvent this, since obviously at some point stdin.read() will wait until new input is provided, which once the true end of input is reached won't ever happen.

So, how can this be fixed, i.e.

  • How to know when the file redirected to stdin is fully read, or
  • how to properly handle this?
Tobias Kienzler
  • 25,759
  • 22
  • 127
  • 221
  • Sounds like you are opening the file as text instead of binary. Can you show the `open` statement please? – cdarke Feb 23 '15 at 11:26
  • @cdarke: Python opens stdin and stdout for you... – PM 2Ring Feb 23 '15 at 11:27
  • 1
    This probably gives you the answer: http://stackoverflow.com/questions/2850893/reading-binary-data-from-stdin – Tom Dalton Feb 23 '15 at 11:29
  • OK, I misunderstood. You probably want to reopen stdin as binary then, but really stdin and stdout are not suitable for binary files. – cdarke Feb 23 '15 at 11:30
  • @cdarke: At least, stdin and stdout are not suitable for binary files on Windows. :) – PM 2Ring Feb 23 '15 at 11:31
  • I’d just like to chime in that this is a very non-Python-specific problem. For instance, there is **no** standards compliant, platform independent way to achieve this in C++. – Konrad Rudolph Feb 23 '15 at 12:19
  • @KonradRudolph Good point, though doesn't C++ have a similar `msvcrt.setmode`? – Tobias Kienzler Feb 23 '15 at 12:21
  • As a side-note, I didn't really want to reimplement `cat`, but rather a `git` clean/smudge filter for a binary format, where git relies on `stdin/stdout` – Tobias Kienzler Feb 23 '15 at 12:22
  • 1
    @Tobias No. *Windows* libraries have `_setmode`, which you can call on the associated C file handle (and, as an implementation detail, the C++ streams on Windows will use that). However, that’s then of course Windows specific, and relies on non-standardised behaviour of the streams. In practice this means it can be achieved, but requires platform- and compiler specific code. – Konrad Rudolph Feb 23 '15 at 13:02
  • @KonradRudolph I keep forgetting how complicated non-Python is :P – Tobias Kienzler Feb 23 '15 at 13:05

3 Answers3

2

You will need to tell Python to open stdin and stdout in binary mode. You can do this with the -u option. Eg

python -u cat.py < binaryfile > supposed_copy

Note that this will make stdin and stdout unbuffered.

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • That's working great, can the effect of `-u` be achieved from within Python code? – Tobias Kienzler Feb 23 '15 at 11:49
  • 1
    @TobiasKienzler: Probably not, since those files are already opened when they get passed to your script. There _might_ be some Windows magic that can alter the mode of a stream that's already open, but if so, accessing it from Python would require using something like `ctypes` (which lets you call functions in DLLs). But anyway, stdin and stdout are special, and usually connected to a terminal, so even if your OS lets you change modes of normal files it might be touchy about stdin & stdout. :) (FWIW, I don't know much about Windows). – PM 2Ring Feb 23 '15 at 11:58
  • Some Windows magic indeed: http://stackoverflow.com/a/28673339/321973 So much for transparent portability :/ – Tobias Kienzler Feb 23 '15 at 12:02
  • Fortunately. This btw also fixes any CRLF <-> LF issues that binary files would encounter. Oh Microsoft, why, why, why... – Tobias Kienzler Feb 23 '15 at 12:08
2

Expanding on this answer:

if sys.platform == "win32":
    import msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
    msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY)
Community
  • 1
  • 1
Tobias Kienzler
  • 25,759
  • 22
  • 127
  • 221
  • Well, that's a bit simpler than I was expecting. :) OTOH, I guess there's a big demand for this kind of thing, so it makes sense that Microsoft have made it fairly easy. – PM 2Ring Feb 23 '15 at 12:05
  • @PM2Ring Indeed, though I wonder why Microsoft made this difference to start with. Just that automated CR/LF conversion mades me curse a lot... – Tobias Kienzler Feb 23 '15 at 12:23
  • Well, Bill Gates wanted to make MS-DOS different to Unix. And I guess he thought the CRLF and Ctrl-Z things were a good idea at the time. Similar remarks apply to using backslash as a path separator. But discussing these topics can lead to [Religious Wars](http://forums.xkcd.com/viewforum.php?f=40), so I guess I better shut up. :) – PM 2Ring Feb 23 '15 at 12:29
1

See Reading binary data from stdin for an explanation of how to make sure stdin/stdout are opened as binary.

Community
  • 1
  • 1
Tom Dalton
  • 6,122
  • 24
  • 35
  • Thanks, indeed [one answer there](http://stackoverflow.com/a/4160894/321973) solves this, though it wouldn't harm to mention that here ;) Anyway, I found [a similar answer](http://stackoverflow.com/a/28673339/321973) – Tobias Kienzler Feb 23 '15 at 12:01