How do figure out that method A is more efficient than method B

Question

I am writing a code and I came across a situation where I would like to test two different methods of doing something and time each so I can make sure I am using the most efficient of the two. I have seen folks post timing information on various questions/answers around here but I could not find any questions or answers related to how to gather and evaluate that data. My questions has 3 parts and I will use this example to explain :

I have written a function that evaluates a mathematical expression and I am storing the results in an Eigen matrix. The matrix is completely symmetric so I want to determine if it is faster to compute the all of the off diagonal elements twice like this.

//Method A
for( int i=0; i < max; i++){
   for(int j= 0; j < max; j++){
        storage_matrix(i,j)=my_function(i,j);
       } 
    }

Or should I copy the off diagonal elements after I only compute them one essentially making the second loop restricted by the first like this.

//Method B 
for( int i=0; i < max; i++){
   for(int j= 0; j < i; j++){
        storage_matrix(i,j)=my_function(i,j);
        storage_matrix(j,i)=storage_matrix(i,j);
       } 
    }

Now this is a very simple example but I would like to learn how to evaluate these types of things so I think a simple example is best. I have an idea about which is faster, but I want to know. I want to look at some numbers and say, "yes this is faster", or perhaps even, "wow look it doesn't make that much of a difference."

Now my 3-part question:

How do I look at timing information in general for code?

Should there be a difference in which is faster depending on the size of max? (ie. should I test withe large and small values for max)

Do I just want to know which does the job in less time or are there other considerations that need to be made when I look into issues Like this?

NB: I am using C++

Just a note on your example: if the matrix is symmetric, would it not be faster to call my_function(i, j) once and store the result at both positions (i, j) and (j, i)? Also, different languages have different utilities for timing code execution, so it would help to know which you're using. — masonsbro, Dec 27 '14 at 18:37
@masonbro I had a typo in the code ( just caught it). That is exactly the question I want to answer, and those utilities are what I want to lear about. I am using C++ (I am very sorry just added that to the description also, whoops) — Ajay, Dec 27 '14 at 18:40

score 1 · Answer 1 · edited May 23 '17 at 12:20

Benchmarking is a type of experiment.

Experiments have to be designed for hypothesis you're examining under that conditions that you set, there is no one size fits all. Benchmarking is notoriously hard to get right, it's very easy to accidentally design an experiment that turns out to not answer your question. It's very sneaky, because you get results anywhere, and there is often nothing telling you that what you did is wrong - it is your job therefore to think of everything, account for everything, and design the experiment such that it really will tell you what you want to know.

Sometimes it's not so bad, if the time difference between two things is very large, it will be apparent even in a sloppy experiment. Where it gets hard is if the time or the time difference is very small.

A couple of things to watch:

the benchmark runs on the intended input (obvious, but often the source of problems)
the noise was less significant than the actual time difference (ideally by a large enough margin that you don't need statistic tests)
the compiler did not optimize away the thing you're trying to measure, this particularly affects loops used to amplify the time difference (ie repeat a function a million times)
sometimes it may be more appropriate to compare the fastest measured times than the average times, the fastest run was least affected by pre-emption and page faults (then again maybe you wanted to include page faults, it depends on your experiment)
there should be no unfair advantage from running second (or first). Include cache effects, the page faults that happen when allocated memory is first actually touched, branch predictors that have to learn certain patterns, lots of things. Be very careful. In some cases it is appropriate to discard the first couple of times, and only consider the "steady state" where everything has the advantage of not being first. But other times it is all about the first time - for example, initializing a program only happens once, so it doesn't matter how fast it is "the second time", a benchmark of initialization code should take that into account.
when measuring latency, make sure you're not measuring throughput (and vice versa). That means making the next iteration of your benchmark loop (the one that amplifies the time difference) either dependent or independent, depending on your intentions. Particularly affects tiny snippets. This is not always trivial, see for example this case.
when the code is small enough, look at the assembly code to ensure it matches your expectations.

Small/fast code is especially challenging, make sure you are intimately familiar with how your processor works if you intend to design that kind of benchmark. If you're developing for PCs, that means reading this and this and sometimes looking things up here. For mobile devices you may also have to consider power usage.

Should there be a difference in which is faster depending on the size of max? (ie. should I test withe large and small values for max)

Yes! It is very often the case that one wins for small sizes and the other wins for larger sizes. In fact you should probably test some sizes in between, too.

In this specific case, you can see that second snippet will probably access the matrix in a poor way. That doesn't really matter if the entire matrix fits in cache anyway, but if it is too large, it should matter, see also this. Of course, that is balanced against doing a different amount of computation.

How do figure out that method A is more efficient than method B

1 Answers1