Code is significantly slower even though it should not

Question

Summary: Our software uses a utility function, this function is called million times so we have to optimized it. I notice that with a simple tweak our code in this utility function, the execution is much faster, down from 10 seconds of execution to 0.6 seconds. Here's the tweak in the utility function:

From code1 (finished in 10 seconds):

double d = (a * x3 + b * y3 + c) / l;
if (std::abs(d) > inMaxDToL) return false;

d = (a * x4 + b * y4 + c) / l;
if (std::abs(d) > inMaxDToL) return false;

return false;

To code 2 (finised in 0.6 seconds):

double d = (a * x3 + b * y3 + c) / l;
if (std::abs(d) > inMaxDToL) return false;

return false;

What I did was returning false earlier in code 2. Obviously in the first code, line 1 + 2 is equivalent to line 3 + 4 in term of workload. So I wonder why removing line 3 + 4 can speed up the processing so much?

Another tweak that also reduce the execution time from 10s to 0.6 seconds is to replace an incall function call:

if (!inBetween(x1, y1, x2, y2, x3, y3)) return false;

with its content:

if ((x2 - x1) * (x2 - x3) > epsilon) return false;
if ((y2 - y1) * (y2 - y3) > epsilon) return false;

The code was also speed up from 10s to 0.6ms.

I'm using Visual Studio 2013 and its compiler.

Did I miss something here?

Edit: to make it clearer:

That few lines are just a part of the utility function
I purposedly return false for debugging purpose

Seems like both could be optimized down to just `return false;` since there are no side effects and all cases end up with `false`. — aschepler, Mar 17 '17 at 14:57
Cannot say about Visual Studio but on Linux you can generate SIMD instructions with certain type of manipulation. — Shiv, Mar 17 '17 at 14:57
Are you sure you don't mean milliseconds when you say seconds? I cannot see how any of these examples can take more than 100ms. — klutt, Mar 17 '17 at 14:57
Is a) the first condition almost always true and b) `x4`, `y4` cold in cache? — Baum mit Augen, Mar 17 '17 at 14:58
You mean it takes 10s and 0.6s to execute one milion times right? — , Mar 17 '17 at 15:14
@klutt I read it as "the function containing this snippet took 10s before I removed these lines, and 0.6s after" -- that is, these snippets are just part of the function. — Nic, Mar 17 '17 at 15:16
Sorry I was not clear: 1. 10s and 0.6s when run million times 2. That few lines are just a part of the utility function 3. I purposedly return false for debugging purpose — Nam Le, Mar 17 '17 at 15:19
You would get a far better answer if you posted the entire utility function, since that is what you are timing. — Joe, Mar 17 '17 at 15:56

score 1 · Answer 1 · edited May 23 '17 at 12:17

1

As insanely well written in this post, I'm pretty sure you are victim of a branch prediction failure.

Also, it might be worth putting the result of your second operation in another variable. If you didn't intend to return 'd', writing to different variables might allow the compiler to do more optimizations.

edited May 23 '17 at 12:17

Community

1
1

answered Mar 17 '17 at 15:21

AlexG

1,091
7
15

awesome, thank you @AlexG! – Nam Le Mar 17 '17 at 15:33

score 0 · Answer 2 · 2017-03-17T15:49:41.723

You can take advantage of the lazy evaluation (short-circuit boolean evaluation) and have both checks in a single condition, avoiding assigning the result to a variable

// second condition will only evaluate if the first one is false.
if((std::abs((a * x3 + b * y3 + c) / l) > inMaxDToL) || 
   (std::abs((a * x4 + b * y4 + c) / l) > inMaxDToL))
   return false;

Code is significantly slower even though it should not

2 Answers2