Relative C code execution time analysis using system monotonic clock

Question

Is there is a simple but sure way to measure the relative differences in performance between two algorithm implementations in C programs. More specifically, I want to compare the performance of implementation A vs. B? I'm thinking of a scheme like this:

In a unit test program:
1. start timer
2. call function
3. stop timer
4. get difference between start stop time
Run the scheme above for a pair of functions A and B, then get a percentage difference in execution time to determine which is faster.

Upon doing some research I came across this question about using a Monotonic clock on OSX in C, which apparently can give me at least nanosecond precision. To be clear, I understand that precise, controlled measurements are hard to perform, like what's discussed in "With O(N) known and system clock known, can we calculate the execution time of the code?, which I assume should be irrelevant in this case because I only want a relative measurement.

Everything considered, is this a sufficient and valid approach towards the kind of analysis I want to perform? Are there any details or considerations I might be missing?

If you're working on macOS Sierra (10.12) or later, then there is finally support for `clock_gettime()` provided by the system, though the real resolution is only microseconds, not nanoseconds. (That is, the `tv_nsec` component of the answer from `clock_gettime()` always ends with three zeros.) The question you reference pre-dates Sierra by over four years. — Jonathan Leffler, Dec 15 '16 at 00:30
Well I'm working with 10.11 and the mach header function calls from the reference still work. But This is good to know. — TSIguy, Dec 15 '16 at 01:43

score 0 · Accepted Answer · edited May 23 '17 at 12:00

The main modification I make to the timing scheme you outline is to ensure that the same timing code is used for both functions — assuming they do have an identical interface, by passing a function pointer to skeletal code.

As an example, I have some code that times some functions that validate whether a given number is prime. The control function is:

static void test_primality_tester(const char *tag, int seed, int (*prime)(unsigned), int count)
{
    srand(seed);
    Clock clk;
    int nprimes = 0;
    clk_init(&clk);

    clk_start(&clk);
    for (int i = 0; i < count; i++)
    {
        if (prime(rand()))
            nprimes++;
    }
    clk_stop(&clk);

    char buffer[32];
    printf("%9s: %d primes found (out of %d) in %s s\n", tag, nprimes,
           count, clk_elapsed_us(&clk, buffer, sizeof(buffer)));
}

I'm well aware of srand() — why call it once?, but the point of using srand() once each time this function is called is to ensure that the tests process the same sequence of random numbers. On macOS, RAND_MAX is 0x7FFFFFFF.

The type Clock contain analogues to two struct timespec structures, for the start and stop time. The clk_init() function initializes the structure; clk_start() records the start time in the structure; clk_stop() records the stop time in the structure; and clk_elapsed_us() calculates the elapsed time between the start and stop times in microseconds. The package is written to provide me with cross-platform portability (at the cost of some headaches in determining which is the best sub-second timing routine available at compile time).

You can find my code for timers on Github in the repository https://github.com/jleffler/soq, in the src/libsoq directory — files timer.h and timer.c. The code has not yet caught up with macOS Sierra having clock_gettime(), though it could be compiled to use it with -DHAVE_CLOCK_GETTIME as a command-line compiler option.

This code was called from a function one_test():

static void one_test(int seed)
{
    printf("Seed; %d\n", seed);
    enum { COUNT = 10000000 };
    test_primality_tester("IsPrime1", seed, IsPrime1, COUNT);
    test_primality_tester("IsPrime2", seed, IsPrime2, COUNT);
    test_primality_tester("IsPrime3", seed, IsPrime3, COUNT);
    test_primality_tester("isprime1", seed, isprime1, COUNT);
    test_primality_tester("isprime2", seed, isprime2, COUNT);
    test_primality_tester("isprime3", seed, isprime3, COUNT);
}

And the main program can take one or a series of seeds, or uses the current time as a seed:

int main(int argc, char **argv)
{
    if (argc > 1)
    {
        for (int i = 1; i < argc; i++)
            one_test(atoi(argv[i]));
    }
    else
        one_test(time(0));
    return(0);
}

Thanks for this response, it validates my thinking. The example you pointed out in the libsoq directory is clear, but I don't see any setup code for a monotonic clock config. I assume I can just use the COCK_MONOTONIC define instead of CLOCK_REALTIME right? — TSIguy, Dec 15 '16 at 19:19
You're correct. Until Sierra, I didn't have a machine where CLOCK_MONOTONIC was an option — I was mostly using `gettimeofday()` — so that wasn't as much of an issue to me. It is my 'working code'; good enough for my purposes, but not necessarily perfect. One disadvantage of CLOCK_MONOTONIC is that it doesn't give you an absolute timestamp (relative to The Epoch — aka 1970-01-01 00:00:00 +00:00), whereas `gettimeofday()` does. But for performance measuring, you're seldom worried about the time it was run to the same level of precision as you're worried about how long it took to run. — Jonathan Leffler, Dec 15 '16 at 19:22

Relative C code execution time analysis using system monotonic clock

1 Answers1