22

The documentation states that the default value for buffering is: If omitted, the system default is used. I am currently on Red Hat Linux 6, but I am not able to figure out the default buffering that is set for the system.

Can anyone please guide me as to how determine the buffering for a system?

smci
  • 32,567
  • 20
  • 113
  • 146
name_masked
  • 9,544
  • 41
  • 118
  • 172

3 Answers3

30

Since you linked to the 2.7 docs, I'm assuming you're using 2.7. (In Python 3.x, this all gets a lot simpler, because a lot more of the buffering is exposed at the Python level.)

All open actually does (on POSIX systems) is call fopen, and then, if you've passed anything for buffering, setvbuf. Since you're not passing anything, you just end up with the default buffer from fopen, which is up to your C standard library. (See the source for details. With no buffering, it passes -1 to PyFile_SetBufSize, which does nothing unless bufsize >= 0.)

If you read the glibc setvbuf manpage, it explains that if you never call any of the buffering functions:

Normally all files are block buffered. When the first I/O operation occurs on a file, malloc(3) is called, and a buffer is obtained.

Note that it doesn't say what size buffer is obtained. This is intentional; it means the implementation can be smart and choose different buffer sizes for different cases. (There is a BUFSIZ constant, but that's only used when you call legacy functions like setbuf; it's not guaranteed to be used in any other case.)

So, what does happen? Well, if you look at the glibc source, ultimately it calls the macro _IO_DOALLOCATE, which can be hooked (or overridden, because glibc unifies C++ streambuf and C stdio buffering), but ultimately, it allocates a buf of _IO_BUFSIZE, which is an alias for the platform-specific macro _G_BUFSIZE, which is 8192.

Of course you probably want to trace down the macros on your own system rather than trust the generic source.


You may wonder why there is no good documented way to get this information. Presumably it's because you're not supposed to care. If you need a specific buffer size, you set one manually; if you trust that the system knows best, just trust it. Unless you're actually working on the kernel or libc, who cares? In theory, this also leaves open the possibility that the system could do something smart here, like picking a bufsize based on the block size for the file's filesystem, or even based on running stats data, although it doesn't look like linux/glibc, FreeBSD, or OS X do anything other than use a constant. And most likely that's because it really doesn't matter for most applications. (You might want to test that out yourself—use explicit buffer sizes ranging from 1KB to 2MB on some buffered-I/O-bound script and see what the performance differences are.)

abarnert
  • 354,177
  • 51
  • 601
  • 671
23

I'm not sure it's the right answer but python 3.0 library and python 20 library both describe io.DEFAULT_BUFFER_SIZE in the same way that the default is described in the docs for open(). Coincidence?

If not, then the answer for me was:

$ python
>>> import io
>>> io.DEFAULT_BUFFER_SIZE
8192

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.1 LTS
Release:        14.04
Codename:       trusty
Saghir A. Khatri
  • 3,429
  • 6
  • 45
  • 76
coreyh
  • 451
  • 4
  • 3
-2
#include <stdio.h>

int main(int argc, char* argv[]){
  printf("%d\n", BUFSIZ);
  return 0;
}

I did 'man setvbuf' to find this. setvbuf is footnote [2] of the documentation page.

seanmcl
  • 9,740
  • 3
  • 39
  • 45
  • No, that's not guaranteed to be the default buffer size; it's only the buffer size used for legacy functions like `setbuf`. – abarnert Aug 12 '13 at 19:05
  • If that's the case, then the argument isn't very helpful: [2] Specifying a buffer size currently has no effect on systems that don’t have setvbuf(). The interface to specify the buffer size is not done using a method that calls setvbuf(), because that may dump core when called after any I/O has been performed, and there’s no reliable way to determine whether this is the case. – seanmcl Aug 12 '13 at 19:09
  • Which argument? And what system are you on where `setvbuf(3)` has footnotes? Third, [CPython 2.7 very clearly calls `setbuf`](http://hg.python.org/cpython/file/2.7/Objects/fileobject.c#l509) if `setvbuf` is not available, so it's not true that it has no effect. (It's true that any positive value has the same effect as any other positive value on `setbuf`-only systems, but that's still definitely not _no_ effect.) And, finally, it clearly calls `setvbuf` is if _is_ available, so your argument that it can't do so is moot. – abarnert Aug 12 '13 at 19:18
  • Thanks for your much better answer. – seanmcl Aug 12 '13 at 19:20
  • I'm not sure mine is a good answer; it's just that there really isn't a good answer. There's no documented way to get this information… – abarnert Aug 12 '13 at 19:22