benchmark a piece of code independent of CPU performance?

Question

My Objective is : I want to test a piece of code (or function) performance, just like how I test the correctness of that function in a unit-test, let say that the output of this benchmarking process is a "function performance index" which is "portable"

My Problem is : we usually benchmarking a code by using a timer to count elapsed time during execution of that code. and that method is depend on the hardware or O/S or other thing.

My Question is : is there a method to get a "function performance index" that is independent to the performance of the host (CPU/OS/etc..), or if not "independent" lets say it is "relative" to some fixed value. so that somehow the value of "function performance index" is still valid on any platform or hardware performance.

for example: that FPI value is could be measured in

number of arithmetic instruction needed to execute a single call
float value compared to benchmark function, for example function B has rating index of 1.345 (which is the performance is slower 1.345 times than the benchmark function)
or other value.

note that the FPI value doesn't need to be scientifically correct, exact or accurate, I just need a value to give a rough overview of that function performance compared to other function which was tested by the same method.

If you want to find out where to optimize the code, independent of CPU, you should [look here](http://stackoverflow.com/questions/375913/what-can-i-use-to-profile-c-code-in-linux/378024#378024). — Mike Dunlavey, Mar 05 '11 at 20:51

score 5 · Accepted Answer · answered Nov 12 '10 at 17:23

5

I think you are in search of the impossible here, because the performance of a modern computer is a complex blend of CPU, cache, memory controller, memory, etc.

So one (hypothetical) computer system might reward the use of enormous look-up tables to simplify an algorithm, so that there were very few cpu instructions processed. Whereas another system might have memory much slower relative to the CPU core, so an algorithm which did a lot of processing but touched very little memory would be favoured.

So a single 'figure of merit' for these two algorithms could not even convey which was the better one on all systems, let alone by how much it was better.

answered Nov 12 '10 at 17:23

Will Dean

39,055
11
90
118

lets take an algorithm A which execute in `O(n)` and B in `O(n^2)`, but somehow in the execution B is use less memory, less cache-misses or other thing which resulting on a specific platform that B outperform A. but we mostly expect that A perform better than B (which doesn't always correct in implementation). still we have comparative value of `n` compared to `n^2`. and this value help user (the one that use the function) to choose which one he should use. this is the value that I want and it doesn't always correct and for exact correctness he should benchmark it. – uray Nov 12 '10 at 17:48
Well, I think I understand what you're saying, but O(f(n)) is always actually kf(n), and if you're comparing the same task on two completely different platforms, the two values of 'k' might be much more variable than the two values of 'n'. I don't think there is a widely-applied system which is any better than big-O, and even big-O is not very good for a lot of the things it's asked to adjudicate on. If you want a single number, you could say O(1)= 0, O(n)=1, O(logn)=2, O(nlogn)=3, etc. – Will Dean Nov 12 '10 at 18:41

score 2 · Answer 2 · answered Nov 12 '10 at 17:25

Probably what you really need is a tcov-like tool.

man tcov says:

Each basic block of code (or each line if the -a option to tcov is specified) is prefixed with the number of times it has been executed; lines that have not been executed are prefixed with "#####". A basic block is a contiguous section of code that has no branches: each statement in a basic block is executed the same number of times.

score 1 · Answer 3 · answered Nov 12 '10 at 17:22

1

No, there is no such thing. Different hardware performs differently. You can have two different pieces of code X and Y such that hardware A runs X faster than Y but hardware B runs Y faster than X. There is no absolute scale of performance, it depends entirely on the hardware (not to mention other things like the operating system and other environmental considerations).

answered Nov 12 '10 at 17:22

Adam Rosenfield

390,455
97
512
589

yes I understand that there is no absolute scale, but what I need is just a rough comparative value and doesn't need to always correct on any platform, lets say just like measurement of O() value in algorithm efficiency measurement. – uray Nov 12 '10 at 17:25

score 0 · Answer 4 · answered Nov 12 '10 at 18:19

0

It sounds like what you want is a program that calculates the Big-O Notation of a piece of code. I don't know if it's possible to do that in an automated fashion (Halting problem, etc).

answered Nov 12 '10 at 18:19

Jeremy Friesner

70,199
15
131
234

score 0 · Answer 5 · answered Nov 12 '10 at 20:32

Like others have mentioned this is not a trivial task and may be impossible to get any sort of accurate results from. Considering a few methods:

Benchmark Functions -- While this seems promising I think you'll find that it won't work well as you try to compare different types of functions. For example, if your benchmark function is 100% CPU bound (as in some complex math computation) then it will compare/scale well with other CPU bound functions but fail when compared with, say, I/O or memory bound functions. Carefully matching a benchmark function to a small set of similar functions may work but is tedious/time consuming.
Number of Instructions -- For a very simple processor it may be possible to count the cycles of each instruction and get a reasonable value for the total number of cycles a block of code will take but with today's modern processors are anything but "simple". With branch prediction and parallel pipelines you can can't just add up instruction cycles and expect to get an accurate result.
Manual Counting -- This might be your best bet and while it is not automatic it may give better results faster than the other methods. Just look at things like the O() order of the code, how much memory the function reads/writes, how many file bytes are input/output etc.... By having a few stats like this for each function/module you should be able to get a rough comparison of their complexity.

benchmark a piece of code independent of CPU performance?

5 Answers5