I've found that the usual clock(), everyone recommends here, for some reason deviates wildly from run to run, even for static code without any side effects, like drawing to screen or reading files. It could be because CPU changes power consumption modes, OS giving different priorities, etc...
So the only way to reliably get the same result every time with clock() is to run the measured code in a loop multiple times (for several minutes), taking precautions to prevent the compiler from optimizing it out: modern compilers can precompute the code without side effects running in a loop, and move it out of the loop., like i.e. using random input for each iteration.
After enough samples are collected into an array, one sorts that array, and takes the middle element, called median. Median is better than average, because it throws away extreme deviations, like say antivirus taking up all CPU up or OS doing some update.
Here is a simple utility to measure execution performance of C/C++ code, averaging the values near median: https://github.com/saniv/gauge
I'm myself still looking for a more robust and faster way to measure code. One could probably try running the code in controlled conditions on bare metal without any OS, but that will give unrealistic result, because in reality OS does get involved.
x86 has these hardware performance counters, which including the actual number of instructions executed, but they are tricky to access without OS help, hard to interpret and have their own issues ( http://archive.gamedev.net/archive/reference/articles/article213.html ). Still they could be helpful investigating the nature of the bottle neck (data access or actual computations on that data).