57

Python Documentation : https://docs.python.org/2/library/functions.html#open

open(name[, mode[, buffering]])  

The above documentation says "The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default.If omitted, the system default is used.".
When I use

filedata = open(file.txt,"r",0)  

or

filedata = open(file.txt,"r",1)  

or

filedata = open(file.txt,"r",2)

or

filedata = open(file.txt,"r",-1) 

or

filedata = open(file.txt,"r")

The output has no change. Each line shown above prints at same speed.
output:

Mr. Bean is a British television programme series of fifteen 25-

minute episodes written by Robin Driscoll and starring Rowan Atkinson as

the title character. Different episodes were also written by Robin

Driscoll and Richard Curtis, and one by Ben Elton. Thirteen of the

episodes were broadcast on ITV, from the pilot on 1 January 1990, until

"Goodnight Mr. Bean" on 31 October 1995. A clip show, "The Best Bits of

Mr. Bean", was broadcast on 15 December 1995, and one episode, "Hair by

Mr. Bean of London", was not broadcast until 2006 on Nickelodeon.

Then how the buffering parameter in the open() function is useful? What value

of that buffering parameter is best to use?

Srivishnu
  • 759
  • 1
  • 6
  • 15
  • 1
    I may be wrong, but I believe buffering only has a visible effect when opening a file for writing, where it buffers the input until a newline is reached or the file is closed. This can be somewhat faster. – kirbyfan64sos Apr 18 '15 at 03:12
  • 3
    You are right to ask this question, which I upvoted. Manuals and tutorials are written for the people who write them themselves! They say "If the buffering value is set to 0, no buffering takes place." Well what buffering mister? I know more than 15 programming languages and I have never heard such a kind of buffering! – Apostolos Jan 13 '18 at 13:10
  • When setting the buffer to 1, then only a single line of buffered data will be displayed and if negative, then the buffer size will be system default. –  Curtis Nov 23 '16 at 21:02
  • there is a note too --> Specifying a buffer size currently has no effect on systems that don’t have setvbuf(). The interface to specify the buffer size is not done using a method that calls setvbuf(), because that may dump core when called after any I/O has been performed, and there’s no reliable way to determine whether this is the case. Any idea what setvbuf() is – pippo1980 Feb 04 '22 at 17:34

5 Answers5

52

Enabling buffering means that you're not directly interfacing with the OS's representation of a file, or its file system API. Instead, a chunk of data is read from the raw OS filestream into a buffer until it is consumed, at which point more data is fetched into the buffer. In terms of the objects you get, you'll get a BufferedIOBase object wrapping an underlying RawIOBase (which represents the raw file stream).

What is the benefit of this? Well interfacing with the raw stream might have high latency, because the operating system has to fool around with physical objects like the hard disk, and this may not be acceptable in all cases. Let's say you want to read three letters from a file every 5ms and your file is on a crusty old hard disk, or even a network file system. Instead of trying to read from the raw filestream every 5ms, it is better to load a bunch of bytes from the file into a buffer in memory, then consume it at will.

What size of buffer you choose will depend on how you're consuming the data. For the example above, a buffer size of 1 char would be awful, 3 chars would be alright, and any large multiple of 3 chars that doesn't cause a noticeable delay for your users would be ideal.

Asad Saeeduddin
  • 46,193
  • 6
  • 90
  • 139
22

You can also check the default buffer size by calling the read only DEFAULT_BUFFER_SIZE attribute from io module.

import io
print (io.DEFAULT_BUFFER_SIZE)

As described here

Tadhg McDonald-Jensen
  • 20,699
  • 5
  • 35
  • 59
N Randhawa
  • 8,773
  • 3
  • 43
  • 47
5

What is perhaps important from practical point of view is that the buffering parameter determines when the data you are sending to the stream is actually saved to disk.

When you open a file without the buffering parameter, and write some stuff to it, you will see the data is written only after the with open(...) as foo: block is exited (or when the file's close() method is called), or when some system-determined default buffer size is reached. But if you set the buffering parameter, it will write the data as soon as that size of the buffer is reached.

Thus using i.e. open('file.txt', 'w', buffering=1) is a useful thing to do when you have a long-running application, and you are sending some data to a file, and you want it to save after each line, and not only after the application quits. Otherwise a crash, or a power outage, etc. could cause the data to be lost.

See also: How often does python flush to a file?

Simimic
  • 132
  • 2
  • 7
1

Buffering is the process of storing a chunk of a file in a temporary memory until the file loads completely. In python there are different values can be given. If the buffering is set to 0 , then the buffering is off. The buffering will be set to 1 when we need to buffer the file.

joel.t.mathew
  • 114
  • 1
  • 15
-1

With buffering set to -1 my file write took 13 minutes. With buffering set to 2**10 my file write took 7 seconds. So, the purpose of buffering is to speed up your program.

  • 2
    Its a buffer, meaning its a silo until its full and needs to be drained. Your program took less time because it was making fewer system calls. –  Feb 10 '21 at 12:56
  • Actually, sadly, in production the buffering didn't help. Still took 13 minutes. I have some weird problem between Psycopg2, OSX and Python3.8 File IO. But in testing the buffering helped a lot! – John Abraham Feb 16 '21 at 17:16