-2

While trying to figure out some of the insides of C I/O handling I stumbled upon some strange behavior of stdio buffer.

If I set the stdin buffer size to 0 and input more than one char after calling the getchar function, then not only the first getchar would return the first inputted value, but all of the following ones too (given there is enough chars provided). Also, if we output the values stored in stdin after every getchar call, we could see that after every other call the stdin buffer would have the value given to the previous getchar (i.e. for input 12345 we would get stdin: 2 stdin: stdin: 4 stdin: as an output).

setvbuf(stdin, NULL, _IONBF, 0);
char inpChar1 = getchar();
printf("stdin: %s \n", *stdin);
char inpChar2 = getchar();
printf("stdin: %s \n", *stdin);
char inpChar3 = getchar();
printf("stdin: %s \n", *stdin);
char inpChar4 = getchar();
printf("stdin: %s \n", *stdin);
char inpChar5 = getchar();
printf("%c %c %c %c %c", inpChar1, inpChar2, inpChar3, inpChar4, inpChar5);

Why does the following code work the way it does? My wild guess would be that there is an another buffer that stores those values, but i currently have no idea as of how to find it or if it is even there.

SunGrow
  • 87
  • 1
  • 8
  • You can look at the libc source code and I suspect you will find a low-level read buffer of `BUFSIZ` bytes is provided. As for the effect of trying to set it to zero, I haven't tried. – David C. Rankin Aug 04 '19 at 18:13
  • 6
    What is `printf("stdin: %s \n", *stdin)` intended to achieve? It is undefined behavior, as `*stdin` is not a null-terminated string. – user4815162342 Aug 04 '19 at 18:14
  • 1
    The operating system will have another buffer. – Antti Haapala -- Слава Україні Aug 04 '19 at 18:18
  • The C runtime reads only the first character. The remaining characters are still in the operating system. – Raymond Chen Aug 04 '19 at 18:18
  • 1
    Contrary to common belief, `stdin` is a pointer to a structure, not to the standard input stream. – S.S. Anne Aug 04 '19 at 18:18
  • I tried your code even though I know it's wrong, just to see what you might have seen. All I get is "Segmentation fault (core dumped)", which is what I would expect. – Zan Lynx Aug 04 '19 at 18:23
  • @ZanLynx I use Visual Studio, which might be the reason – SunGrow Aug 04 '19 at 18:32
  • @RaymondChen than what we have is stdin buffer that is used mainly for strings, because we can't read more than one char from OS's IO stream at a time? – SunGrow Aug 04 '19 at 18:32
  • 2
    @JL2210 Well, natural language is imprecise: `stdin` is often called a "stream", e.g. in the IEEE Open standard, http://pubs.opengroup.org/onlinepubs/9699919799/ says "standard I/O streams". *Conceptually* they are sequences of bytes, hence "streams"; it's just that all access is through a pointer to the opaque `FILE` structure. – Peter - Reinstate Monica Aug 04 '19 at 18:58
  • You can read multiple characters from the operating system. But if you set the stream to unbuffered, then it will read only one character at a time and leave the rest in the operating system. – Raymond Chen Aug 04 '19 at 20:07
  • See also explanation at [this answer](https://stackoverflow.com/questions/9180001/what-is-the-difference-between-getch-and-getchar/51173273#51173273). – Steve Summit Aug 05 '19 at 01:54

4 Answers4

2

Here is an example of working code which shows how this works on a Linux system. You can see from all of the read and write calls that there's no buffer. It reads and writes one at a time.

$ cat c-read-buffer-test.c

#include <stdio.h>

int main() {
  char input[8] = {0};
  const size_t input_len = sizeof input;
  size_t i;
  int inC;

  setvbuf(stdin, NULL, _IONBF, 0);
  setvbuf(stdout, NULL, _IONBF, 0);

  for (i = 0; i < input_len; ++i) {
    inC = getchar();
    input[i] = inC;
  }

  for (i = 0; i < input_len; ++i) {
    if (i > 0)
      printf(" ");
    printf("%c", input[i]);
  }
  printf("\n");
  return 0;
}

$ echo abcdefgh | strace ./c-read-buffer-test 

execve("./c-read-buffer-test", ["./c-read-buffer-test"], 0x7fffd97152e0 /* 61 vars */) = 0
brk(NULL)                               = 0x48a20000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=197658, ...}) = 0
mmap(NULL, 197658, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fffab5b0000
close(3)                                = 0
openat(AT_FDCWD, "/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\25\0\1\0\0\0\0n\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=882496, ...}) = 0
mmap(NULL, 279840, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fffab560000
mmap(0x7fffab590000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x20000) = 0x7fffab590000
close(3)                                = 0
openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\25\0\1\0\0\0\220P\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=6723976, ...}) = 0
mmap(NULL, 2118520, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fffab350000
mmap(0x7fffab540000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e0000) = 0x7fffab540000
close(3)                                = 0
mprotect(0x7fffab540000, 65536, PROT_READ) = 0
mprotect(0x7fffab590000, 65536, PROT_READ) = 0
mprotect(0x10010000, 65536, PROT_READ)  = 0
mprotect(0x7fffab640000, 65536, PROT_READ) = 0
munmap(0x7fffab5b0000, 197658)          = 0
set_tid_address(0x7fffab653110)         = 58832
set_robust_list(0x7fffab653120, 24)     = 0
rt_sigaction(SIGRTMIN, {sa_handler=0x7fffab566630, sa_mask=[], sa_flags=SA_SIGINFO}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {sa_handler=0x7fffab566740, sa_mask=[], sa_flags=SA_RESTART|SA_SIGINFO}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
read(0, "a", 1)                         = 1
read(0, "b", 1)                         = 1
read(0, "c", 1)                         = 1
read(0, "d", 1)                         = 1
read(0, "e", 1)                         = 1
read(0, "f", 1)                         = 1
read(0, "g", 1)                         = 1
read(0, "h", 1)                         = 1
write(1, "a", 1a)                        = 1
write(1, " ", 1 )                        = 1
write(1, "b", 1b)                        = 1
write(1, " ", 1 )                        = 1
write(1, "c", 1c)                        = 1
write(1, " ", 1 )                        = 1
write(1, "d", 1d)                        = 1
write(1, " ", 1 )                        = 1
write(1, "e", 1e)                        = 1
write(1, " ", 1 )                        = 1
write(1, "f", 1f)                        = 1
write(1, " ", 1 )                        = 1
write(1, "g", 1g)                        = 1
write(1, " ", 1 )                        = 1
write(1, "h", 1h)                        = 1
write(1, "\n", 1
)                       = 1
exit_group(0)                           = ?
+++ exited with 0 +++
Zan Lynx
  • 53,022
  • 10
  • 79
  • 131
1

C isn't storing it. If stdin is a regular file, the underlying file position just stays wherever the logical stdio position of stdin is.

If stdin is a terminal, the "underlying file position" (note: it's not seekable so it's not really a position, but the same concept applies in some sense) is a position in the operating system's (or, on bare metal, the 16550 UART's hardware FIFO or similar on other models) input buffer. Each time you call fgetc, it will read another byte from there, directly into the return value to pass back to your program, with no buffering by the C [library] implementation.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
0

stdin is a pointer to a FILE structure which is implementation dependent. You cannot use it as a character pointer, and Visual C should warn at warning level -W4 or so about the type mismatch. Using it this way is undefined behavior, and the results are meaningless.

It is conceivable that the structure contains a buffer or the last read character somewhere but one cannot know without looking at the library sources if available and/or reading the documentation, and one should cast to character pointer when one uses it that way, and when one knows the first element is a character (-array). In any case such code is un-portable.

Peter - Reinstate Monica
  • 15,048
  • 4
  • 37
  • 62
  • Even with unbuffered input, there must be space somewhere for 1 character for `ungetc()` — or for `scanf()` et al to put the character that didn't match the input. There doesn't have to be any more than one character for put-back operations. – Jonathan Leffler Aug 04 '19 at 20:06
  • 1
    @JonathanLeffler Yeah, that's what I figured (see my other comment). But then this character is not necessarily a member of the `FILE` struct proper. – Peter - Reinstate Monica Aug 04 '19 at 20:39
-4

You should read the documentation for your C runtime. For example some runtimes, like IBM runtime clearly state that:

_IONBF No buffer is used.

So it is clear that read operations return one character at a time.

From the above documnetation:

The setvbuf() function has no effect on stdout, stdin, or stderr.

Your function call has no effect with this runtime.

Update per request

The OP said that he uses Microsoft product. Here is relevant documentation from Microsoft:

_IONBF No buffer is used, regardless of arguments in call to setvbuf.

The answer to your specific situation is:

Read operations read one character at a time. Your setvbuf function call does not affect stdin buffer size.

user14063792468
  • 839
  • 12
  • 28
  • I don't think the documentation of an IBM product is relevant here: The OP said the are using a Microsoft product, and likely an Intel architecture. Produce the relevant MS documentation and I take back my downvote. – Peter - Reinstate Monica Aug 04 '19 at 18:34
  • This varies by C implementation. For example, the GNU C Library just says, "The setvbuf() function may be used only after opening a stream and before any other operations have been performed on it." – Zan Lynx Aug 04 '19 at 18:35
  • @PeterA.Schneider I was writing the answer and didn't see a comment by the OP. – user14063792468 Aug 04 '19 at 18:37
  • @ZanLynx This is what my answer has in front line. It depends on `C` runtime library. – user14063792468 Aug 04 '19 at 18:42
  • @PeterA.Schneider The `Microsoft` can change documnetation for their productany time soon. This can produce a dead link in my post, or what is worse, the wrong information. So you are being overly smart with you request. – user14063792468 Aug 04 '19 at 19:00
  • I'm not sure why you are quoting IBM's documentation then. Because IBM is even *more inert* than Microsoft so that the documentation and link stay valid longer? ;-) – Peter - Reinstate Monica Aug 04 '19 at 19:11
  • @PeterA.Schneider And do you think that `Visual Studio` users update as often as `IBM` compiler users do? – user14063792468 Aug 04 '19 at 19:15
  • The C standard does not impose the limitation of `setvbuf()` not affecting `stdin`, `stdout` or `stderr` (see C11 [§7.21.5.6 The `setvbuf` function](https://port70.net/~nsz/c/c11/n1570.html#7.21.5.6)). That is an unusual and non-conformant 'extension' ('deviation' might be a better term) from the C standard (although I quoted C11, there was no limitation in C90, C99 or C18 either). – Jonathan Leffler Aug 04 '19 at 20:12