1

I'm trying to read data from stdin, in C++ on Windows, efficiently, which means preferably in large chunks. This can be done with:

ReadFile(GetStdHandle(STD_INPUT_HANDLE), buf, bytestoread, &bytesread, 0);

or

read(0, buf, bytestoread);

But in both cases, it only works if bytestoread is set to a very small number e.g. 50; if set to a larger number e.g. one megabyte, the call fails with 'not enough space' error, as though the data were not going directly to the buffer I provide, but instead being copied via some internal buffer of fixed size. This is true whether the input is piped, or typed on the console.

Is Windows just limited in how large a chunk a process can read from stdin at a time? If so, what's the maximum chunk size that is guaranteed to work?

A complete program that shows the problem:

    #include <errno.h>
    #include <io.h>
    #include <stdio.h>
    #include <string.h>

    char buf[1000000];

    int main(int argc, char **argv) {
      auto r = read(0, buf, sizeof buf);
      if (r < 0)
        perror("read");
      return 0;
    }
rwallace
  • 31,405
  • 40
  • 123
  • 242
  • 1
    *Windows* not limited. but this is device(to which point `STD_INPUT_HANDLE`) specific. result can be different for console, pipe, filesystem file – RbMm Aug 21 '19 at 10:16
  • @RbMm Okay, so what value is guaranteed to work in all cases? – rwallace Aug 21 '19 at 10:19
  • That's an unusual way to do console input, surely we all like to know what the point of that might be. But sure, console input is buffered independently, nothing happens until you press the Enter key. 64KB is a popular limit for the console. Surely you'll have to tinker with Get/SetConsoleMode(), with the expectation that you have to at least turn off the ENABLE_LINE_INPUT option. – Hans Passant Aug 21 '19 at 10:40
  • @HansPassant Well it's not actually about console input; the point is, my program needs to accept potentially large chunks of data, which might be read from a file, or might be piped in; it's the latter case I'm concerned about. – rwallace Aug 21 '19 at 10:51
  • really i doubt that you can got error with big buffer. what is device ? what is error/code exactly ? in general - because you allocate buffer - it size not limited. – RbMm Aug 21 '19 at 11:13
  • @RbMm What you say is what I would've expected, but turns out not to be the case. Error happens with both console and piped input. – rwallace Aug 21 '19 at 11:20
  • so show your code exactly with exactly ntstatus error code – RbMm Aug 21 '19 at 11:22
  • How do you allocate your buffer? On the stack or on the heap? I did a little test, and if I put a large buffer on the stack my program fails (even before the call to ReadFile); but if I put the buffer on the heap, everything works OK. Please post a minimal program that compiles and shows the problem. – Roel Schroeven Aug 21 '19 at 11:40
  • @RoelSchroeven Done. As shown, produces 'read: Not enough space', but works okay if buf is reduced to 50 bytes. – rwallace Aug 21 '19 at 11:56
  • @rwallace Hate to say it because it's not really helpful, but that code works for me (on Windows 7). – Roel Schroeven Aug 21 '19 at 12:01
  • @RoelSchroeven *blink* You compile and run it at the console, and it accepts a line of input with no error, even with the one megabyte buffer? I'm on Windows 7 too, and I've tried with both Microsoft C++ and Clang, 32 and 64 bit, compiling as C and C++, and I get the error message in all cases. – rwallace Aug 21 '19 at 12:12
  • 1
    @rwallace Indeed: one megabyte buffer, accepts a line of input without error. When I pipe a lot of data in, I get a different error because Windows still tries to write to the pipe even after it's closed. That's because we should actually be reading in a loop until we have read all data -- but that's a totally different issue. I compiled with Embarcadero C++Builder which uses a different C runtime. I'll see if I can try with Microsoft C++. – Roel Schroeven Aug 21 '19 at 12:16
  • @RoelSchroeven Right, my real code does read in a loop, but I stripped that out for the minimal example. I had not expected a different runtime to make a difference given that I get the same problem with ReadFile, but I would definitely be interested in your results if you can try with Microsoft C++. Just to check, you are using the standard Windows terminal, not Powershell or one of the UNIX front ends? – rwallace Aug 21 '19 at 12:20
  • 1
    @rwallace I compiled with the compiler from Visual Studio 2008 now. "echo hello | foo" works, but just "foo" now says "read: Not enough space"! Interesting. (I'm using the standard Windows terminal). Actually the same happens with my original program: just the program, without piped input, says "read: Not enough memory". Apprrently I hadn't tried that before. – Roel Schroeven Aug 21 '19 at 12:24
  • Summarized a bit: when using a pipe on the command line (like "echo hello | foo" or "type bigfile | foo") everything works. But when just starting foo with the intention of typing in something, I have the same problem as you, both with C++Builder and Microsoft C++. Same thing on Linux works flawlessly. Windows kernel bug?? – Roel Schroeven Aug 21 '19 at 12:33
  • at first use not `read` but `ReadFile`, at second show exactly error code from`RtlGetLastNtStatus()`. – RbMm Aug 21 '19 at 12:33
  • @RoelSchroeven Right, I will probably just use isatty to bail if input is not piped. Thanks! – rwallace Aug 21 '19 at 20:49
  • @RoelSchroeven Hmm. The documentation does say, “The ReadFile function may fail with ERROR_INVALID_USER_BUFFER or ERROR_NOT_ENOUGH_MEMORY whenever there are too many outstanding asynchronous I/O requests,” but why would that be happening here? – Davislor Aug 22 '19 at 04:22
  • @RoelSchroeven Could it be: “The ReadFile function may fail with ERROR_NOT_ENOUGH_QUOTA, which means the calling process's buffer could not be page-locked.”? Does this still occur if you the code uses a buffer aligned to the page size, static or on the heap? – Davislor Aug 22 '19 at 04:27
  • @Davislor: You should probably address your comments to rwallace: he/she is the one who asked the question. I am merely a passer-by trying to help. – Roel Schroeven Aug 22 '19 at 08:20
  • 1
    the OP probably not want understand source of problem, if used `read` instead `ReadFile` or `NtReadFile` and still not post ntstatus error code – RbMm Aug 22 '19 at 08:47

3 Answers3

2

If you try to read stdin binary data you need:

  1. to set _setmode( _fileno( stdin), _O_BINARY);
  2. and fread( buf, 1, bufSize, stdin)

And take a look at my similar solution when app takes binary data via stdin, throttle speed, and puts it out to stdout.

bin_pipe_throttle

stepger
  • 21
  • 5
1

You don’t say which version of the runtime and OS you use, but I cannot reproduce this problem with MSVC 19.16.27031.1 on Windows 10. There are a few documented reasons it might fail. From the MSDN documentation of ReadFile:

Characters can be read from the console input buffer by using ReadFile with a handle to console input. The console mode determines the exact behavior of the ReadFile function. By default, the console mode is ENABLE_LINE_INPUT, which indicates that ReadFile should read until it reaches a carriage return. If you press Ctrl+C, the call succeeds, but GetLastError returns ERROR_OPERATION_ABORTED. For more information, see CreateFile.

There’s another way you could be getting this error, relating to asynchronous I/O, but that does not seem to be the problem here. You probably want to turn off the ENABLE_LINE_INPUT flag with SetConsoleMode. The documentation also says the call could fail with ERROR_NOT_ENOUGH_QUOTA if the memory pages of the buffer cannot be locked. However, you use a static buffer that should not have this problem.

If you’re reading a file on disk, and not a console stream, you might map it to memory, which eliminates any intermediate buffering and loads the sections of files as needed, by the same mechanism as virtual memory.

Davislor
  • 14,674
  • 2
  • 34
  • 49
  • how is `ENABLE_LINE_INPUT` related to problem at all ? – RbMm Aug 22 '19 at 00:43
  • 1
    @RbMm From the documentation: “By default, the console mode is ENABLE_LINE_INPUT, which indicates that ReadFile should read until it reaches a carriage return.” This might be causing a larger read to fail. – Davislor Aug 22 '19 at 04:20
  • 1
    no, i not view how this can *causing a larger read to fail*. how all this related to problem – RbMm Aug 22 '19 at 07:40
  • 1
    *The documentation also says the call could fail with ERROR_NOT_ENOUGH_QUOTA if the memory pages of the buffer cannot be locked. However, you use a static buffer that should not have this problem.* this is also absolute wrong. are buffer "static" or not absolute unrelated here. all user mode buffer are pageable. problem can be if device use direct io – RbMm Aug 22 '19 at 07:44
  • 2
    *Does page-aligning the buffer help?* - no – RbMm Aug 22 '19 at 07:44
0

As far as I know, is stdin not limited. It works as an endless Stream and should provide as much storage as you need. The only option I see is that the kernel you are using blocks at some point

sxeros
  • 668
  • 6
  • 21