How can I test two algorithms and determine which is faster?

Question

Whenever working on a specific problem, I may come across different solutions. I'm not sure how to choose the better of the two options. The first idea is to compute the complexity of the two solutions, but sometimes they may share the same complexity, or they may differ but the range of the input is small that the constant factor matters.

The second idea is to benchmark both solutions. However, I'm not sure how to time them using c++. I have found this question: How to Calculate Execution Time of a Code Snippet in C++ , but I don't know how to properly deal with compiler optimizations or processor inconsistencies.

In short: is the code provided in the question above sufficient for everyday tests? is there some options that I should enable in the compiler before I run the tests? (I'm using Visual C++) How many tests should I do, and how much time difference between the two benchmarks matters?

Here is an example of a code I want to test. Which of these is faster? How can I calculate that myself?

unsigned long long fiborecursion(int rank){
    if (rank == 0) return 1;
    else if (rank < 0) return 0;
    return fiborecursion(rank-1) + fiborecursion(rank-2);
}

double sq5 = sqrt(5);
unsigned long long fiboconstant(int rank){
    return pow((1 + sq5) / 2, rank + 1) / sq5 + 0.5;
}

Onw thing that can be said is, that you have to test your real code with compiler optimizations turned on. Otherwise you test unoptimized code and as of that you don't get usefull results. If you test artificial code you might risk that the compiler optimizes everything or important parts away. — t.niese, Mar 27 '18 at 18:04
[This response](https://stackoverflow.com/a/19471595/1655492) from the question you linked will do nicely. Build your code optimized, choose a big enough sample size, do multiple runs and average the execution time. — DigitalEye, Mar 27 '18 at 18:05
How can I enable all optimizations? how 'big enough sample size' is enough? — Megadardery, Mar 27 '18 at 18:22
You have to add optimization flags -O1 for size and -O2 for speed and test your function with big int `fiborecursion(1000000)` or `fuborrvursion(100000000L) — Ratah, Mar 27 '18 at 19:08
Search the internet for "How to benchmark C++ programs". The keyword is *benchmarking*. — Thomas Matthews, Mar 27 '18 at 19:36

score 1 · Accepted Answer · 2018-03-27T19:43:49.210

1

Using the clock from this answer

#include <iostream>
#include <chrono>

class Timer
{
public:
    Timer() : beg_(clock_::now()) {}
    void reset() { beg_ = clock_::now(); }
    double elapsed() const { 
        return std::chrono::duration_cast<second_>
            (clock_::now() - beg_).count(); }

private:
    typedef std::chrono::high_resolution_clock clock_;
    typedef std::chrono::duration<double, std::ratio<1> > second_;
    std::chrono::time_point<clock_> beg_;
};

You can write a program to time both of your functions.

int main() {
    const int N = 10000;
    Timer tmr;

    tmr.reset();
    for (int i = 0; i < N; i++) {
        auto value = fiborecursion(i%50);
    }
    double time1 = tmr.elapsed();

    tmr.reset();
    for (int i = 0; i < N; i++) {
        auto value = fiboconstant(i%50);
    }
    double time2 = tmr.elapsed();

    std::cout << "Recursion"
            << "\n\tTotal: " << time1
            << "\n\tAvg: " << time1 / N
            << "\n"
            << "\nConstant"
            << "\n\tTotal: " << time2
            << "\n\tAvg: " << time2 / N
            << "\n";
}

I would try compiling with no compiler optimizations (-O0) and max compiler optimizations (-O3) just to see what the differences are. It is likely that at max optimizations the compiler may eliminate the loops entirely.

edited Mar 27 '18 at 19:43

answered Mar 27 '18 at 19:10

Which loop are you referring to when you said it might get rid of it? – Mar 27 '18 at 19:39
Both. If the compiler realizes that the only values calculated in the loops are never actually used it may just skip both loops entirely. – Mar 27 '18 at 19:44
The recursion solution I had was horribly inefficient, so the constant solution won. However another recursion solution did turn out to be better than the constant solution. (Even though it technically has lower complexity). Also the constant solution breaks for bigger values. – Megadardery Mar 27 '18 at 19:56
@Megadardery: If you check the compiler's asm output to see what loop you're *actually* timing, you might find that the compiler optimized your recursive version that did better into a loop instead of a recursion, and possibly even turned it into a normal iterative Fibonacci (`a+=b; b+=a`) which has O(N) run time instead of [O(2^n) for the horrible naive recursive version](https://stackoverflow.com/a/22084314/224132). – Peter Cordes Mar 28 '18 at 03:30
**[Do not benchmark with `-O0`](https://stackoverflow.com/questions/32000917/c-loop-optimization-help-for-final-assignment/32001196#32001196)**. That's useless, because some code slows down way more than others when making an anti-optimized debug build. See lots of downvoted / closed performance questions on SO that made this mistake... `-O3` vs. `-O3 -fno-tree-vectorize` vs. `-O3 -march=native` is interesting to see what auto-vectorization did, though. **To check that `-O3` didn't optimize away your code, increase the repeat-count or problem size and check that the run time goes up**. – Peter Cordes Mar 28 '18 at 03:36
Summing up the results into an `unsigned tmp` which you print at the end can stop the compiler from optimizing away, or store to a `volatile int dummy` (but don't make a key part of the code under test use `volatile`!) or use more advanced things like inline `asm` statements that the compiler treats as a black box. See [CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!"](https://www.youtube.com/watch?v=nXaxk27zwlk) for an `escape` function that requires the compiler to produce a value in a register, but doesn't add extra asm instructions. – Peter Cordes Mar 28 '18 at 03:41
@PeterCordes I can see why benchmarking with `-O0` is useless when trying to optimize or write high performance code, but why is it bad when you want to test some high level question like "Is array indexing or pointer arithmetic faster?". There is an answer, and that is why compiler optimizations exist. But you don't find out for yourself unless you compile without optimizations. – Mar 28 '18 at 15:52
That's a very *low-level* question. Compilers can and do optimize array indexing to pointer increments, *and sometimes vice versa* in cases where they decide its better to increment one index register instead of multiple pointers inside a loop. But not with `-O0`, and forcing the compiler to make stupid ridiculous code that spills/reloads every C variable to memory between statements will usually just make array indexing look worse. When in reality compilers are smart at optimizing code that uses array indexing. – Peter Cordes Mar 28 '18 at 15:57
It's a perfect example of why optimizing your source to be fast with `-O0` teaches you lessons that don't apply to normal optimized code. If you want to find anything whether it's faster for the compiler to emit asm that uses an index register, or increments multiple pointers, you have to write loops in assembly language. `-O0` is totally useless because of all the extra store/reloads. It's not just un-optimized, it's specifically anti-optimized for debugging. – Peter Cordes Mar 28 '18 at 16:00

How can I test two algorithms and determine which is faster?

1 Answers1