5

I am trying to measure the time taken by a set of statements. The following is a pseudocode. The code is implemented in C++ on a Xilinx chipset, with a custom RTOS, so the traditional c++ clock functions do not work here.

I do not need help on the actual time measurement, but more on the math on how to calculate the actual execution time.

one = clock.getTime();
/*statement 
*     1      *
*     to     *
*     10     */
two = clock.getTime();
fTime = two - one;

Now I know the time taken by the statements. This time is also includes the time taken by getTime() too right?

one = clock.getTime();
clock.getTime();
two = clock.getTime();
cTime = two - one;     //Just measure and the min value i get is 300 microseconds.

Now this block gives me the time taken by getTime().

Finally, my question is:

What is the actual time taken by the statements?

  1. fTime - cTime
  2. fTime - (2* cTime)
  3. Other equation ?
ctrl-shift-esc
  • 876
  • 1
  • 9
  • 19
  • 2
    I would say it is closer to your equation 2, though `getTime` is a very fast operation, unless you are looking at very high performance code, I doubt you will see a big difference between `fTime` and `fTime - (2*cTime)` – Cory Kramer May 28 '14 at 12:05
  • I don't know if this makes sense but if you need very precise results you could: Configure a GPIO port (output), Set some port to low, execute your statements, Set the port to high and finally measure with an oscilloscope – Emilien May 28 '14 at 12:16
  • 1
    It is *very* important to run the 2nd snippet *many* times and only use the smallest value you get. No point in the extra getTime call. You can never be completely sure that you got an accurate value, do watch out for getting negative results from the 1st snippet. – Hans Passant May 28 '14 at 12:18
  • @Emilien I do not have an oscilloscope to measure. – ctrl-shift-esc May 28 '14 at 12:27
  • @HansPassant The second snippet gave me a min value of 300 microseconds. Why would the first snippet give me negative values? – ctrl-shift-esc May 28 '14 at 12:29
  • Because the overhead is much larger than the time a statement makes. So being off a bit with the calibration can easily produce a negative value. – Hans Passant May 28 '14 at 12:31
  • @HansPassant The statements actually copy values from specific address in to a buffer. So far with the results, I have not encountered negative values. – ctrl-shift-esc May 28 '14 at 12:33

2 Answers2

2

Your time measurement shifts the time

if it is on stable enough platform then the shift is the same for all times so

one = t1 + dt
two = t2 + dt

after substraction the shift eliminates itself so

two-one = (t2+dt)-(t1+dt) = t2-t1

so there is no need to make corrections for time measurement shift in this case.

Problems starts on multi-scalar/vector architectures where the code execution is variable

  • due to different cache miss-es
  • different prefetch invalidation

and so on then you have to play with cache invalidation. Also if your getTime() waits for interrupt or HW event that can also add few error T's

In that case measure many times and get the avg or smallest valid result something like:

Community
  • 1
  • 1
Spektre
  • 49,595
  • 11
  • 110
  • 380
  • That's not true if part of `dt` is work that `getTime` does before it actually measures the time. Then this impacts on `two` but not on `one`. Same problem in reverse for work done inside `getTime` post-measurement. Unless you know that the overhead on both sides of the actual measurement is exactly equal, you can't strip it away like this. In general you can't reliably elide the overhead, which is why we typically do not bother trying. – Lightness Races in Orbit Oct 19 '18 at 11:32
  • @LightnessRacesinOrbit luckily the part of `dt` you are refer to is a very small percentage of the measurement itself. Just to be sure we are meaning the same things you mean stuff like: return value, stack, and code/data CACHE and their overhead difference between first and second `getTime` call).... – Spektre Oct 19 '18 at 11:37
  • That depends on what you're measuring, but you can make it assuredly so by performing the task 100,000 times (pick a number sensibly based on how long your task takes) instead of just once, which is the common approach for this and for a few other reasons also. – Lightness Races in Orbit Oct 19 '18 at 11:38
1

Typically when benchmarking we perform the measured tasks many many times in between the timer calls, and then divide by the number of task executions.

This is to:

  • smooth over irrelevant variations, and
  • avoid the measured duration falling below the actual clock resolution, and
  • leave the timing overhead as a totally negligible fraction of the time.

In that way, you don't have to worry about it any more.

That being said, this can introduce problems of its own, particularly with respect to things like branch predictors and caches, as they may be "warmed up" by your first few executions, impacting the veracity of your results in a way that wouldn't happen on a single run.

Benchmarking properly can be quite tricky and there is a wealth of material available to explain how to do it properly.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055