8

I watched the talk: "PostgreSQL vs. fsync. How is it possible that PostgreSQL used fsync incorrectly for 20 years, and what we'll do about it." via https://fosdem.org/2019/schedule/event/postgresql_fsync/ and also read https://lwn.net/Articles/752063/ as background.

The really short and simplified summary is with Linux if you call fsync() and it fails, don't think you can call fsync() again to fix it, as the second time the call will succeed and you will have corrupted data on disk (the failed buffer cache pages are marked as clean after the first failed call). There is a lot of detail as to why this happens (supports the case when a USB is taken out - you don't want to retry and hold on to dirty buffer cache pages that can never succeed).

How does FlushFileBuffers() behave in this situation? I am particularly interested in files being accessed over CIFS where failures are more likely.

Also, given the OS can attempt to write dirty buffer cache pages to stable storage at any time in the background, how can user-land programs pick up these failures via the Win32 API?

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
David Sitsky
  • 123
  • 5
  • Windows apps do not call FlushFileBuffers(). If the flush is critical, you can use unbuffered I/O. See https://learn.microsoft.com/el-gr/windows/desktop/api/fileapi/nf-fileapi-createfilea and https://learn.microsoft.com/el-gr/windows/desktop/FileIO/file-buffering. The problem arises in linux because anyone can call fsync(), where in Windows you need admin to flush the entire disk. – Michael Chourdakis Feb 13 '19 at 04:44
  • Excellent question! The answer will probably depend on the file system driver, as `FlushFileBuffers` is just a user-mode wrapper around `NtFlushBuffersFile`, and it looks to me like that function just assembles a flush IRP (`IRP_MJ_FLUSH_BUFFERS`) and sends it via `IoCallDriver`. Of course, one should eye calls to `FlushFileBuffers` with a great deal of suspicion. There are better ways of implementing transactional I/O in Windows, like creating the file with the `FILE_FLAG_NO_BUFFERING` flag. – Cody Gray - on strike Feb 13 '19 at 04:47
  • I agree with the comment about using direct I/O for more control. PostgreSQL seems to be moving in that direction longer term. In my situation I am dealing with a Java-based database Apache Derby which in effect calls FlushFileBuffers on key files controlled by the database when transactions commit. It does not use direct I/O. I have seen unexplained corruption at times particularly with CIFS filesystems, and was curious if it was somehow be related. – David Sitsky Feb 14 '19 at 00:22

0 Answers0