Is there a way to measure the efficiency of a program regardless of CPU?

Question

I am trying to build an online judge for programming problems (like UVA OJ). When programs are judged, their efficiency (i.e. how fast they can process test inputs) need to be tested. However, in servers, the CPUs are usually very powerful and can run even badly coded programs really fast. Moreover, some programs may get more CPU when the traffic is low, and some programs may get less due to high traffic, which is unfair.

So I was wondering, is there a way to measure program efficiency regardless of which CPU it is running on? Maybe with some CPU cycle calculations or something like that?

Note:

I am using PHP on the server side.
I can use Linux commands in the server.
I can request Linux programs to be installed in the server.
I think I can limit memory usage to a process (which deals with memory efficiency), but I don't know how to limit CPU usage. I asked here about it.

So any Linux or PHP solution would be awesome.

You may want to consider the CPU time. It differs from the time between the program starts and terminates. It is a measure of how much time the CPU actually spent running the program, and is almost unaffected by other procesdes. — user4098326, Dec 13 '14 at 03:57
If your programs are short lived you can also use performance counters to count the number of instructions completed. With PAPI that's the PAPI_TOT_INS counter. — Adam, Dec 13 '14 at 04:04
@user4098326 Yes I thought about it, but the problem is, the CPU time may be really fast during low traffic, and it may be a bit slow during high traffic. So when less people are using the site, even bad programs may get passed. — CluelessNoob, Dec 13 '14 at 08:42
@Adam Can you please give me some links about how I can use it? — CluelessNoob, Dec 13 '14 at 08:43
You're thinking wallclock time. CPU time is the amount of time a particular program spent using the CPU. If your program is truly CPU bound then this time should remain constant. Google has lots of PAPI examples, only you know what's appropriate for your system. http://stackoverflow.com/questions/8160004/how-to-use-papi-periodically-for-performance-measurements — Adam, Dec 13 '14 at 18:53
As I understand, your definition of efficiency is how many programs test takers can code and execute per unit something, e.g. time. Before making any suggestions, I need to know more about your testing mechanism. By the way, I don't think that CPU time is going to give you any useful metric. — Taylor Kidd, Dec 15 '14 at 15:51
@TaylorKidd Well, the test takers have to build a program to solve a particular problem. There is a big test input file (usually randomly generated) and the program should be able to process the file within a limited time (say, 1 second). The problem is, even inefficient programs may be able to process it within the time limit because of the powerful and unstable (i.e. not always equal) CPU power of servers, where those programs would fail on constant CPU. That's why I am looking to find a better way to calculate if the program is `efficient` even if it gets variable CPU power. — CluelessNoob, Dec 16 '14 at 05:40
@Adam `"If your program is truly CPU bound then this time should remain constant."` Can you please explain this a little bit. This sounds interesting :) — CluelessNoob, Dec 16 '14 at 05:41
@lonekingc4 If all your program does is some heavy calculation, then the CPU time of multiple runs should be identical regardless of what other load the machine has. If the program uses a lot of memory, then the other loads may affect how much cache/ram/swap is available to the timed program, and hence it can spend more time waiting on memory. Same with I/O, or networking, or any other shared resource. Even temperature of the CPU can affect speed. — Adam, Dec 16 '14 at 08:05
There is no right answer for what you're asking. You'll have to make judgement calls for what's good enough for your use case. — Adam, Dec 16 '14 at 08:09
At least two things will mess up your use of CPU time as a metric: file I/O is slow compared to processing, and any program has a huge amount of setup and tear down processing. Both of these will skew your results significantly for a short program (<30sec depending) if you don't take them into account. I might be able to make some suggestions, but it'll be later today (US pacific). — Taylor Kidd, Dec 16 '14 at 17:46
@Adam The memory consumption is usually pretty low (<= 128 MB). The I/O is the same for all submitted programs for a problem, so it's constant. Ok, so if I use, for example, the `time` command of Linux to measure how long a program ran, it will always give me a pretty much equal result regardless of how much CPU it gets? — CluelessNoob, Dec 17 '14 at 00:54
@Adam Thanks a lot. The solution of Taylor Kidd (running 2 programs simultaneously) is the one I like better, however, yours might come in handy as well :) — CluelessNoob, Dec 18 '14 at 01:45

score 1 · Accepted Answer · answered Dec 17 '14 at 01:07

Environment: As I understand your problem, you want to measure the performance of a program written by a test taker, and then you want to be able to compare that performance against either a reference or against other test taker programs. These programs will run on possibly different web servers. The test taker will access the testing program through a browser. The test takers will be distributed over some network (local to a lab? To a campus? To the world?). The test input is from a file. The expected runtime of the programs is <5 secs, with a median of 1 sec.

Metric: CPU time will not help you because it means different things depending upon the hardware. For example, let’s say you are comparing the performance of the same CPU bound program on a Haswell generation Intel Xeon server, vs on a first generation Pentium. The execution of the same program is equally efficient but the one running on the Pentium has a much larger CPU time due to the hardware it is running on. Even if you got down to cycles (see PAPI), you’d have the same issue. The key is that you need to compare the performance of the programs (runtime) against some standard reference.

Solution: This is a possible solution. It is theoretically possible but may not be practical given web technology and limits. You create a standard reference program (std_pgm), and then run it simultaneously with the test taker’s program (tt_pgm). The key is ‘simultaneously’. You want both tt_pgm and your std_pgm to execute in the same environment (processor, OS, load, etc). Then you can compare relative performance in different environments.

Other issues: (a) You need to insure that the programs execute simultaneously with the same background processes. They don’t necessarily have to run on the same core as long as the cores are equally loaded. (b) Try to minimize process setup time and file I/O times compared to the execution time of the programs. (c) Run the programs multiple times, ideally under the same single process. This serves two purposes: it is easier to compare tt_pgm and std_pgm, and it gives you a metric as to whether the execution environment of tt_pgm and std_pgm are the same. (If the performance of the tt_pgm compared to std_pgm varies significantly, it means something is happening in the background to one and not the other.)

I won’t guarantee this will work, but it seems reasonable to me.

Awesome! The solution is great. Thanks :D I need to try this one. I am thinking of using [this](http://stackoverflow.com/questions/3004811/how-do-you-run-multiple-programs-from-a-bash-script). — CluelessNoob, Dec 18 '14 at 01:41

Is there a way to measure the efficiency of a program regardless of CPU?

1 Answers1