2

I am confused about a couple of things when it comes to the issue of stdout and stderr being buffered/unbuffered:

1)

Is the statement "stdout/err is buffered/unbuffered" decided by my Operating System or the programming language library functions (particular the write() or print() functions) that I am working with ?

While programming in C, I have always gone by the rule that stdout is buffered while stderr is unbuffered. I have seen this in action by calling sleep() after putchar() statements within a while loop to see the individual characters being placed on stderr one by one, while only complete lines appeared in stdout. When I tried to replicate this program in python, both stderr and stdout had the same behaviour: produced complete lines - so I looked this up and found a post that said:

sys.stderr is line-buffered by default since Python 3.9.

Hence the question - because I was under the impression that the behaviour of stderr being buffered/unbuffered was decided and fixed by the OS but apparently, code libraries free to implement their own behaviour ? Can I hypothetically write a routine that writes to stdout without a buffer ?

The relevant code snippets for reference:

/* C */
while ((c = fgetc(file)) != EOF) {
    fputc(c, stdout /* or stderr */);
    usleep(800);
 }

# Python
for line in file:
    for ch in line:
        print(ch, end='', file=sys.stdout) # or sys.stderr
        time.sleep(0.08);

2)

Secondly, my understanding of the need for buffering is that: since disk access is slower than RAM access, writing individual bytes would be inefficient and thus bytes are written in blocks. But is writing to a device file like /dev/stdout and /dev/stdin the same as writing to disk? (Isn't disk supposed to be permanent? Stuff written to stdout or stderr only appears in the terminal, if connected, and then lost right?)

3)

Finally, is there really a need for stderr to be unbuffered in C if it is less efficient?

First User
  • 704
  • 5
  • 12

2 Answers2

3

Is the statement "stdout/err is buffered/unbuffered" decided by my Operating System or the programming language library functions (particular the write() or print() functions) that I am working with ?

Mostly it is decided by the programming language implementation, and programming languages standardize this. For example, the C language specification says:

At program startup, three text streams are predefined and need not be opened explicitly — standard input (for reading conventional input), standard output (for writing conventional output), and standard error (for writing diagnostic output). As initially opened, the standard error stream is not fully buffered; the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device.

(C2017, paragraph 7.21.3/7)

Similarly, the Python docs for sys.stdin, sys.stdout, and sys.stderr say:

When interactive, the stdout stream is line-buffered. Otherwise, it is block-buffered like regular text files. The stderr stream is line-buffered in both cases. You can make both streams unbuffered by passing the -u command-line option or setting the PYTHONUNBUFFERED environment variable.

Be aware, however, that both of those particular languages provide mechanisms to change the buffering of the standard streams (or in the Python case, at least stdout and stderr).

MOREOVER, the above is relevant only if you are using streams (C) or File objects (Python). In C, this is what all of the stdio functions use -- printf(), fgets(), fwrite(), etc. -- but it is not what (say) the POSIX raw I/O functions such as read() and write() use. If you use raw I/O interfaces such as the latter then there is only whatever buffering you perform manually.

Hence the question - because I was under the impression that the behaviour of stderr being buffered/unbuffered was decided and fixed by the OS

No. The OS (at least Unixes (including Mac) and Windows) does not perform I/O buffering on behalf of programs. Programming language implementations do, under some circumstances, and they are then in control of the details.

but apparently, code libraries free to implement their own behaviour ?

It's a bit more nuanced than that, but basically yes.

Can I hypothetically write a routine that writes to stdout without a buffer ?

Maybe. In C or Python, at least, you can exert some control over the buffering mode of the stdout stream. In C you can adjust it dynamically at runtime, but in Python I think the buffering mode is decided when Python starts.

You may also be able to bypass the buffer of a buffered stream by performing (raw) I/O on the underlying file descriptor, but this is extremely poor form, and depending on the details, it may produce undefined behavior.

Secondly, my understanding of the need for buffering is that: since disk access is slower than RAM access, writing individual bytes would be inefficient and thus bytes are written in blocks.

All I/O is slow, even I/O to a terminal. Disk I/O tends to be especially slow, but program performance generally benefits from buffering I/O to all devices.

But is writing to a device file like /dev/stdout and /dev/stdin the same as writing to disk?

Sometimes it is exactly writing to disk (look up I/O redirection). Different devices do have different performance characteristics, so buffering may improve performance more with some than with others, but again, all I/O is slow.

Finally, is there really a need for stderr to be unbuffered in C if it is less efficient?

The point of stderr being unbuffered (by default) in C is so that messages directed there are written to the underlying device (often a terminal) as soon as possible. Efficiency is not really a concern for the kinds of messages that this policy is most intended to serve.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • There is OS buffering, but it's a different sort of buffering. Every socket connection has buffers, for example, so the socket can receive data before the client program reads it, and the client program can send data even if the comms are temporarily saturated. And `write` to a disk file returns before the data is actually committed to persistent storage. So `flush()` and `fsync()` both flush buffers, but they're not the same buffers (and they work independently, so you have to do both, in order, if you're using stdio.) – rici Aug 04 '21 at 00:36
2

https://linux.die.net/man/3/stderr, https://linux.die.net/man/3/setbuf, and https://linux.die.net/man/2/write are helpful resources here

  • If you use the raw syscall write, there won't be buffering. I'd imagine the same is true for WinAPI but I don't know.
  • Python and C want to make it easier to write things, so they wrap the raw syscalls with a file pointer (in C)/file object (in python). This, in addition to storing the raw file descriptor used to make the syscalls, can optionally do things like buffer to reduce the amount of syscalls you're making.
  • You can change the buffering settings of a file or stream. (In C that's setbuf, I'm not sure for python.)
  • C and Python just happen to have different default configurations of stderr's wrapper.

For 2), writing to a pipe is usually much faster than writing to disk, but it's still a relatively slow operation compared to memcpy or the like, which is what buffering essentially is. The processor has to jump into kernel mode and back.

For 3), I'd guess that C developers decided it was more important to get errors on-time than to get performance. In general, if your program is spitting out lots of data to stderr you have bigger problems than performance.

Kaia
  • 862
  • 5
  • 21
  • 1
    This was really helpful. Thanks. But about the 2nd point - I thought /dev/... files were device files? or are they pipes? – First User Aug 03 '21 at 18:01
  • 1
    my memory (and take this with a big grain of salt) is that when you run `./foo` from your shell, the shell process forks off a child, sets up appropriate pipes with https://man7.org/linux/man-pages/man2/pipe.2.html between its own process and the child's FD's 0, 1, and 2. /dev/stdout is typically a symlink to /proc/self/1 or /dev/fd/1 or similar. The /dev/fd/1 device files handle redirecting the output to whatever is on file descriptor 1 for the current process, which might be a pipe, a file on disk, or another device file like /dev/null – Kaia Aug 03 '21 at 18:30
  • 1
    So like, if you do `./foo > file.txt`, `/dev/stdout` for process `foo` is going to be a device file that ends up sending your output to a file on the disk. That'll be exactly as slow as any other write to disk. – Kaia Aug 03 '21 at 18:36
  • 2
    @Keon: that's not correct. `stdout` will just be the file itself. You can see that it's not a pipe because you can call `seek`, for example. – rici Aug 03 '21 at 23:37
  • 2
    The redirects are set up before the process is forked, so pipes are only needed if you actually redirect to a pipe. `/dev/stdout` isn't a real device; it's a kind of alias. – rici Aug 04 '21 at 00:22
  • @rici interesting. When I've written a shell as a toy project, the way the shell got stdout from an executing program (assuming no redirection to a file or other process) was by calling pipe() three times, forking, and the child `dup2`'s the appropriate ends of the pipes into 0/1/2. Is that not how a proper shell does it? – Kaia Aug 04 '21 at 17:52
  • 1
    @Keon: Under normal circumstances, the shell doesn't need to "get stdout". It just runs the program and lets the output flow to whatever stdout is. If you're trying to capture the output of a program into an environment variable, then you indeed need to set up a pipe on stdout, but not on stdin or stderr. – rici Aug 04 '21 at 18:20
  • 1
    How shells handle various redirects will vary between shells, but I certainly don't know of any shell which would set up a process to forward a file in order to handle a `<` redirect. Bash doesn't even set up a process to handle here-docs and here-strings; it copies the input into a temporary file and redirects stdin to that file. Creating unneeded processes is generally a no-no for a shell. – rici Aug 04 '21 at 18:22