177

I found this in the Python documentation for File Objects:

flush() does not necessarily write the file’s data to disk. Use flush() followed by os.fsync() to ensure this behavior.

So my question is: what exactly is Python's flush doing? I thought that it forces to write data to the disk, but now I see that it doesn't. Why?

martineau
  • 119,623
  • 25
  • 170
  • 301
geek
  • 2,677
  • 4
  • 23
  • 21

4 Answers4

282

There's typically two levels of buffering involved:

  1. Internal buffers
  2. Operating system buffers

The internal buffers are buffers created by the runtime/library/language that you're programming against and is meant to speed things up by avoiding system calls for every write. Instead, when you write to a file object, you write into its buffer, and whenever the buffer fills up, the data is written to the actual file using system calls.

However, due to the operating system buffers, this might not mean that the data is written to disk. It may just mean that the data is copied from the buffers maintained by your runtime into the buffers maintained by the operating system.

If you write something, and it ends up in the buffer (only), and the power is cut to your machine, that data is not on disk when the machine turns off.

So, in order to help with that you have the flush and fsync methods, on their respective objects.

The first, flush, will simply write out any data that lingers in a program buffer to the actual file. Typically this means that the data will be copied from the program buffer to the operating system buffer.

Specifically what this means is that if another process has that same file open for reading, it will be able to access the data you just flushed to the file. However, it does not necessarily mean it has been "permanently" stored on disk.

To do that, you need to call the os.fsync method which ensures all operating system buffers are synchronized with the storage devices they're for, in other words, that method will copy data from the operating system buffers to the disk.

Typically you don't need to bother with either method, but if you're in a scenario where paranoia about what actually ends up on disk is a good thing, you should make both calls as instructed.


Addendum in 2018.

Note that disks with cache mechanisms is now much more common than back in 2013, so now there are even more levels of caching and buffers involved. I assume these buffers will be handled by the sync/flush calls as well, but I don't really know.

Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
  • 15
    When I use the `with file('blah') as fd: #dostuff` construct, I know it guarantees closing the file descriptor. Does it also flush or sync? – Marcin Dec 13 '13 at 14:00
  • 6
    @Marcin: It flushes, but does NOT sync. – Alex I Jun 21 '14 at 22:05
  • 10
    `fsync` is necessary for atomicity. you can't expect to close a file, reopen it and find your content without a `fsync` in the middle. It often works, but it doesn't on linux with ext4 and default mount options for example. Also `fsync` is not guaranteed to really magnet-flip the iron on the platters, because 1: fsync can be disabled (by laptop-mode), and 2: the hard disk internal buffering might not be instructed to flush. – v.oddou Sep 09 '14 at 06:48
  • 1
    is there any way to flush an operating system's buffer for all files, if the file is written by another process? – Nacht Sep 10 '14 at 23:24
  • Any idea why python doesn't also fsync on file closing? It seems logical to me that when you close a file you want it to be on disk the way you had it when you closed it. Is fsync expensive or is there another reason to only flush? – Hakaishin May 29 '18 at 09:55
  • 2
    fsync is *relatively* expensive. In general, you're not writing mission critical software that needs 100% ACID compliance and durability for disk-access, and if you do you're probably painfully aware of it and should be aware of the steps you can take to get these guarantees. Calling fsync will wait for physical disk access to occur to write the data to disk, whereas flushing and closing will only wait for data to be moved to cache memory. The speed difference is probably several orders of magnitude. – Lasse V. Karlsen May 29 '18 at 12:49
  • @Nacht `os.sync` seems to be only available on Unix. – tm1 Dec 11 '18 at 07:42
  • And Windows, I have no idea about other platforms. – Lasse V. Karlsen Dec 11 '18 at 07:52
  • I can't find anything about `file.flush()` in the modern Python 3 documentation anymore, so I think this advice to call `file.flush()` followed by `os.fsync()` is no longer applicable, and is outdated. I added those two calls into a datalogging script I had, every 10 `f.write()` iterations, and it went from logging all of my data, even if I Ctrl + C killed it while running, to logging only every 10th `f.write()`, even though I was calling `f.write()` *every* iteration and `f.flush` + `os.fsync()` every 10th iteration. So, I'm not using those anymore. `f.close()` is still wise though when done. – Gabriel Staples Aug 04 '23 at 18:59
13

Because the operating system may not do so. The flush operation forces the file data into the file cache in RAM, and from there it's the OS's job to actually send it to the disk.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 6
    You're right, but `actually` is relative here: if the target device has write caching enabled, data might not have reached the actual platters/chips when `os.fsync()` returns. – Frédéric Hamidi Aug 19 '11 at 20:42
9

It flushes the internal buffer, which is supposed to cause the OS to write out the buffer to the file.[1] Python uses the OS's default buffering unless you configure it do otherwise.

But sometimes the OS still chooses not to cooperate. Especially with wonderful things like write-delays in Windows/NTFS. Basically the internal buffer is flushed, but the OS buffer is still holding on to it. So you have to tell the OS to write it to disk with os.fsync() in those cases.

[1] http://docs.python.org/library/stdtypes.html

Dan
  • 4,488
  • 5
  • 48
  • 75
0

Basically, flush() cleans out your RAM buffer, its real power is that it lets you continue to write to it afterwards - but it shouldn't be thought of as the best/safest write to file feature. It's flushing your RAM for more data to come, that is all. If you want to ensure data gets written to file safely then use close() instead.

zA.
  • 38
  • 7