Whats the most correct (best) way to compare code algorithms/snippets?

Question

I have written a C++ code to calculate the reflection of solar radiation (based on ray tracing principals). I have included a number of acceleration techniques. In my write up I have to justify these algorithms. I was going to do it purely from a time base but the comment by @weberc2 HERE makes be believe that is not the best solution.

I have looked at code analysis software like Very sleepy and AMD Code Analyst which helped to identify bottle necks etc.

As the supervisor will probably have a very poor programming knowledge a time based analysis just seems the most logical...

e.g "Running the same scenario with the Grid active increased calculation accuracy by 20% with only a 2sec penalty on time..."

It is a single thread program. Is it really that dangerous to use time? Any suggestions. Thank you all

I do not follow - what are you interested on? You should define your compare criteria. Is it: (1) time? (2) memory usage? (3) accuracy/optimality (4)Code readability? (5) Code maintainability (6) ... ? The answer is based upon what you is more important for you (or the client) — amit, Nov 08 '12 at 14:14
@amit Basically, the person marking will not care about the actual code itself. They are more concerned with"How did I get to the solution" ie "Does it give the right answer?" (3)accuracy and also "How long will it take to get an answer after I click " . I have to defend the logic behind the algorithms. If I start talking about memory usage and readability they will get very lost. — Seb, Nov 08 '12 at 14:19
I think correctness, efficiency, maintainability are probably the quality you want to assess to compare your solutions. — didierc, Nov 08 '12 at 14:30

amit · Accepted Answer · 2012-11-08T14:35:55.043

2

I disagree. Comparing times is perfectly fine - but with a limitation.

A single run says nothing. This is why we have statistical tools and tests to show A is distinct then B.

Run a series of tests, on several test cases and various conditions. Store the data (run times) in two different lists, and then run a statistical test to show one is better then the other.

The "answer" from the statistic test is a P-Value. The P-Value says "What is the probability that you are wrong". For example, if you had a set of tests, and you run a statisitcal test and find P_Value = 0.01. This means that with probability 99% - the two samples are distinct, and you can conclude the one with the lower average is better.

The de-facto standard for statistical tests (at least in my field) is Wilcoxon Paired Signed Test.

P.S. The statistical test will "prove" the assumption for the conditions it was tested on, for example - if you run it on AMD CPU, it says nothing on what will happen on Intel CPU (maybe the instruction set makes the "worse" significantly better in it).

However, note that nevertheless, it is very acceptable and widely used in articles in fields such as AI and Information Retrieval.

edited Nov 08 '12 at 14:35

answered Nov 08 '12 at 14:21

amit

175,853
27
231
333

Yes. I agree - The entire ray tracer depends on statistics. I will have to run the same senario several times. So you disagree with " if he runs several clock tests this afternoon and then tests a different algorithm tomorrow morning, his comparison may not be reliable as he may be sharing resources with many more processes in the afternoon than in the morning..." – Seb Nov 08 '12 at 14:26
1

@user1002744 If that is what you're concerned about, don't measure wall clock time but actual user code execution time. Then you just have to make sure that no swapping etc. occurs and you _should_ be fine from my point of view. Maybe amit wants to include that point in his/her answer. – Jonas Schäfer Nov 08 '12 at 14:29
1

@user1002744: There is still room for mistakes, but assuming both run on the same conditions, and you get a good P-Value, it will be unreasonable to think the test is wrong. It is especially true with data that can be randomly generated - where you can run thousands of tests and basically get `P-Value < 10^-20`. The probability for mistake in this cases is really, slim to none. In modern science, statistics is what actually validates a claim in many cases. In the field of AI, it is the main tool to determine if an algorithm is state of the art or not. – amit Nov 08 '12 at 14:29
@JonasWielicki you mean execution time like in [link] (http://stackoverflow.com/questions/5248915/execution-time-of-c-program) What do you mean by swapping? Is that `swap` @amit - Ok, cool, that's the reassurance I needed. Thanks for you help! – Seb Nov 08 '12 at 14:34
@user1002744 From the manpage, I cannot safely read that ``clock()`` is what you want. ``getrusage()`` (also mentioned in the linked questions top answer) will do the trick though: What I meant is the amount of CPU time spent by (and only by) the process, given by the ``ru_utime`` field in the ``rusage`` struct. By swapping I mean your system running out of physical memory and having to move RAM to the disk (swap partition/swap file, hence swapping). – Jonas Schäfer Nov 08 '12 at 15:39

score 0 · Answer 2 · answered Nov 08 '12 at 15:37

Whether "The new version runs in 50% of the time of the old version with p=0.99" means anything to the supervisor will depend on the supervisor's knowledge of the significance of the algorithm for the business. In many cases, statistical analysis of the measurement of one function is a very useful tool for the programmer working on improving it, but meaningless to management.

Presumably, this piece of code is being optimized because something that matters to the business will be improved by making it faster. Usually, that is what should be discussed in reporting results to the supervisor: "Because of this change typical cases of transaction X will complete in an average of 6 seconds instead of 10 seconds", "With this version we can get two runs of our engineering simulation in a working day instead of only one".

The specific type of time measurement (wall-clock or total CPU or user CPU or memory footprint) to be done during optimization will usually be obvious if you think in terms of the business objective your supervisor cares about.

That's exactly what I want to do! But also the other way around as well. "Because of this change typical cases of transaction X will be 10% more accurate and will only be will completed in an average of 4 seconds longer, which is a reasonable trade-off" From the comments in the link I was nervous that the output cpu time is not a real indication of the execution time or is not a fair measure on the speed of the program/algorithm. I understand also, that a comparison between two single runs wont work, and I have to use statistical tools to compare. — Seb, Nov 10 '12 at 12:32
There are two distinct issue. One is whether you need statistical tools. Sometimes the statistics are a no-brainer. For example, it always runs at least twice as fast for 30 runs.
The other issue is whether wall clock for this function or cpu time for this function is a legitimate measure. For *any* measure that is not in itself a business objective, you should have started from a business objective and worked out what simpler measures it depends on. Normally, the last step should be to re-measure the business objective, such as the average transaction time. — Patricia Shanahan, Nov 10 '12 at 14:22

Whats the most correct (best) way to compare code algorithms/snippets?

2 Answers2