1

We'd like to measure the I/O time from an application by instrumenting the read() and write() routines on a Linux system. However, the calls to write() return very fast. According to my OS man page for write (man 2 write):

NOTES A successful return from write() does not make any guarantee that data has been committed to disk. In fact, on some buggy implementations, it does not even guarantee that space has successfully been reserved for the data. The only way to be sure is to call fsync(2) after you are done writing all your data.

Linux manual as of 2013-01-27

so we understand that the write() call initiates an asynchronous call that at some point will flush the data to disk.

So the question is, is there a way to know when the data (even if it has been grouped for caching purposes) is being actually written into disk? -- preferably, when that process starts and ends?

EDIT1 We're particularly interested on measuring the application behavior and we'd like to avoid changing the semantics of the application by changing the parameters to open() -- adding O_SYNC -- or injecting calls to sync(). By changing the application semantics, you can't actually tell about the behavior of the original application.

Harald
  • 3,110
  • 1
  • 24
  • 35
  • 1
    "Commiting to disk" has always been an elusive process. Originally, one could not 'know' when it happened because a robot had to fetch the right storage tape, insert it into the writer, have it written, then remove the tape again. Nowadays, there are *multiple* layers of buffering and caching involved, possibly starting as early as inside your own program, and extending all the way to temporary RAM aboard a hard disk controller. – Jongware Mar 03 '16 at 09:42
  • I have extended the question because we're not interested on changing the application semantics. – Harald Mar 03 '16 at 10:07
  • @BasileStarynkevitch, I think that the first sentence answers your question. We want to measure the application I/O time. – Harald Mar 03 '16 at 10:09

3 Answers3

3

You could open the file as O_SYNC, which in theory means that write won't return until the data is written to disk. Though what data, real or metadata, is written is dependant on the file system and how it is mounted. This is changing how your application is really working though.

If you're really interested in handling actual I/O to storage yourself (are you a database?) then O_DIRECT leaves you control. Again this is a change in behaviour and imposes additional constraints on your application. It may be what you need, may not.

You really appear to be asking about benchmarking real performance, so the real question is what you want to know. Since a real system does so much caching, the "instant" return from the write is "real" in the sense of what delays on your application actually are. If you're looking for I/O throughput you might be better looking at higher level system statistics.

Joe
  • 7,378
  • 4
  • 37
  • 54
  • I'm interested on benchmarking applications, and I would not like to change the application semantics at all. Otherwise, the measured and the real execs would differ. – Harald Mar 03 '16 at 10:00
  • 2
    In that case your application is taking exactly the amount of time in the system call as you are measuring. Caching is making the call short, as it should. – Joe Mar 03 '16 at 10:02
2

You basically can't know when the data is really written to disk, and the actual disk writing may happen long time after (typically, a few minutes) your process has terminated. Also, your disk itself has (inside the disk controller) some cache. Be happy with that, since the page cache of your system is then very effective (and makes your Linux system behave quickly).

You might consider calling the sync(2) system call, but you often should not (it could be slow, and still don't guarantee any writing, it is often asking the kernel to flush buffers later).

On a given opened file descriptor, you could consider fsync(2). As Joe answered, you might pass O_SYNC to open, but that would slow down the system.

I strongly suggest (for performance reasons) to trust your kernel page cache management and avoid forcing any disk flush manually. See also the related posix_fadvise(2) & madvise(2) system calls.

If you benchmark some program, run it several times (and take into account what matters to you the most: an average of the measured times -perhaps excluding the best and/or worst of them-, or the worse or the best of them). So the point is that the I/O time (or the CPU time, or the elapsed real time) of an application is something very ambiguous. You probably want to explain your benchmarking process when publishing benchmark results.

Community
  • 1
  • 1
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
0

You can refer to this link. It might help you. Flush Data to disk

As far as writing to disk is concerned it is unpredictable. There is no definitive way of telling it. But you can make sure that data is written to disk by calling sync.

Community
  • 1
  • 1
Rohit Magdum
  • 104
  • 4
  • Thank you for the link, but I'm looking for a way to measure when that writing happens. I'm not interested on changing the semantics of the application by adding "superfluous calls", otherwise the measured and the real execs would differ. – Harald Mar 03 '16 at 10:02
  • Well like I mentioned there is no definitive way to make sure that data is written immediately to the disk. Following link re-enforces the same point. http://stackoverflow.com/questions/20215516/disabling-disk-cache-in-linux So either you will have to use sync and disturb your environment slightly or work with whatever data you getting. – Rohit Magdum Mar 03 '16 at 11:08