0

I am trying to measure the execution time of FIO benchmark. I am, currently, doing so wrapping the FIO call between gettimeofday():

gettimeofday(&startFioFix, NULL);
FILE* process = popen("fio --name=randwrite --ioengine=posixaio rw=randwrite --size=100M --direct=1 --thread=1 --bs=4K", "r");
gettimeofday(&doneFioFix, NULL);

and calculate the elapsed time as:

double tstart = startFioFix.tv_sec + startFioFix.tv_usec / 1000000.;
double tend = doneFioFix.tv_sec + doneFioFix.tv_usec / 1000000.;
double telapsed = (tend - tstart);

Now, the question(s) is

  1. telapsed time is different (larger) than the runt by FIO output. Can you please help me in understanding Why? as the fact can be seen in FIO output:

    randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=posixaio, iodepth=1
    fio-2.2.8
    Starting 1 thread
    
    randwrite: (groupid=0, jobs=1): err= 0: pid=3862: Tue Nov  1 18:07:50 2016
    write: io=102400KB, bw=91674KB/s, iops=22918, runt=  1117msec
    ...
    

    and the telapsed is:

    telapsed: 1.76088 seconds
    
  2. what is the actual time taken by FIO execution: a) runt given by FIO, or b) the elapsed time by getttimeofday()

  3. How does FIO measure its runt? (probably, this question linked to 1.)

PS: I have tried to replace the gettimeofday(with std::chrono::high_resolution_clock::now()), but it also behaves the same (by same, I mean it also gives larger elapsed time than runt)

Thank you in advance, for your time and assistance.

Avni
  • 3
  • 4

1 Answers1

0

A quick point:gettimeofday() on Linux uses a clock that doesn't necessarily tick at a constant interval and can even move backwards (see http://man7.org/linux/man-pages/man2/gettimeofday.2.html and https://stackoverflow.com/a/3527632/4513656 ) - this may make telapsed unreliable (or even negative).

  1. Your gettimeofday/popen/gettimeofday measurement (telapsed) is going to be: the fio process start up (i.e. fork+exec on Linux) elapsed + fio initialisation (e.g. thread creation because I see --thread, ioengine initialisation) + fio job elapsed (runt) + fio stopping elapsed + process stop elapsed). You are comparing this to just runt which is a sub component of telapsed. It is unlikely all the non-runt components are going to happen instantly (i.e. take up 0 usecs) so the expectation is that runt will be smaller than telapsed. Try running fio with --debug=all just to see all the things it does in addition to actually submitting I/O for the job.
  2. This is difficult to answer because it depends on what you want you mean when you say "fio execution" and why (i.e. the question is hard to interpret in an unambiguous way). Are you interested in how long fio actually spent trying to submit I/O for a given job (runt)? Are you interested in how long it takes your system to start/stop a new process that just so happens to try and submit I/O for a given period (telapsed)? Are you interested in how much CPU time was spent submitting I/O (none of the above)? So because I'm confused I'll ask you some questions instead: what are you going to use the result for and why?
  3. Why not look at the source code? https://github.com/axboe/fio/blob/7a3b2fc3434985fa519db55e8f81734c24af274d/stat.c#L405 shows runt comes from ts->runtime[ddir]. You can see it is initialised by a call to set_epoch_time() (https://github.com/axboe/fio/blob/6be06c46544c19e513ff80e7b841b1de688ffc66/backend.c#L1664 ), is updated by update_runtime() ( https://github.com/axboe/fio/blob/6be06c46544c19e513ff80e7b841b1de688ffc66/backend.c#L371 ) which is called from thread_main().
Community
  • 1
  • 1
Anon
  • 6,306
  • 2
  • 38
  • 56