2

I'm writing an algorithm where efficiency is very important. Sometimes I also want to track the algorithm behaviour by calling some "callback" functions. Let's say my code looks like this:

public float myAlgorithm(AlgorithmTracker tracker) {
    while (something) { // millions of iterations
        doStuff();
        if (tracker != null) tracker.incrementIterationCount(); // <--- How to run the if only once?
        doOtherStaff();
    }
}

How do I prevent to execute the if statement a million times? Does the compiler see that tracker is never reassigned? If it's null on the first check, it will always be. If it's not, it will never be.

Ideally I would like to tell the compiler to build my code in such a way so that if tracker is null (at runtime) it would run with the same performance as

    while (something) { // millions of iterations
        doStuff();
        doOtherStaff();
    }

I thought of two solutions:

  • I could write two versions of myAlgorithm, one with the calls and one without them, but that would lead to lots of code duplication.

  • I could extract AlgorithmTracker to an interface and create a fake empty tracker with empty functions. Still, I don't know if the compiler would optimize the calls away.

mightyWOZ
  • 7,946
  • 3
  • 29
  • 46
AFusco
  • 474
  • 3
  • 18
  • if i understand correctly you can have null only once? if so you can just break after you find it and continue from the next one till the end of the input (meaning 2 whiles) – Yonatan Karp-Rudin Dec 18 '19 at 14:28
  • Have you *verified* that this is a problem? It looks like the kind of optimization that the JIT might actually do for you (since it knows this is a local variable and can't be changed from outside code). – Joachim Sauer Dec 18 '19 at 14:37
  • I am not sure that this is a problem because I don't know how to see if the if statement is evaluated or not. I could rely on timing but there are many other things in my code that could affect it. I'm suspecting the JIT might do this automatically but I'm looking for an expert's view. – AFusco Dec 18 '19 at 14:41
  • Mind that there’s a mandatory `null` check anyway, so your explicit check only changes the behavior from “throw when null” to “skip when null”. Whether the optimizer moves the check out of the loop depends on what happens within `doStuff()` and `doOtherStaff()` but you can assume that if it doesn’t, the conditional wasn’t performance critical anyway. – Holger Dec 18 '19 at 15:34

3 Answers3

6

For most CPU architectures you don't have to worry about the optimization that you want to apply because that particular optimization is part of most contemporary CPU's. it is called branch prediction and current CPU's are very good at this.

on an average every 6th instruction executed by a CPU is a branch, and if for every branch CPU had to wait and evaluate the branch condition it would make the execution lot slower.

Branch prediction and Speculative execution

So when faced with a branch, without evaluating the branch condition, the CPU starts executing (Speculative execution) a path which it thinks is highly likely to be correct and on a later stage when result of branch condition becomes available, CPU checks if it is executing the correct path.

if the path picked by CPU is consistent with the result of branch condition, then CPU knows that it's executing correct path and hence it keeps going at 100% speed, otherwise it will have to flush out all the instructions that it executed speculatively and start with the correct path.

But how does CPU know which path to pick?

enter the branch predictor subsystem of a CPU. in its most basic form it would store the information about the past behavior of a branch, for example if a branch is not being picked for some time then its likely that it would not be picked now. This is a simple explanation and a real branch predictor is going to be quite complex.

So how effective are these branch predictors ?

Given that at their core the branch predictors are just pattern matching machines, if your branch shows a predictable pattern then you can rest assured that branch predictor will get it right. but if your branch shows no pattern at all then branch predictor is not going to help you, worst yet it would hamper your code execution because of all the wrong predictions.


How your code is going to work out with the branch predictors?

In your case value of branch control variable never changes, so the branch is either going to be picked every iteration of the loop or it is never going to be picked. This clearly shows a pattern which even the most basic of the branch predictors can discern. which means your code will practically execute as if condition is not there, because after a first few iterations the branch predictor will be able to pick the path with 100% accuracy.

To know more read this great so thread

Fun fact: this particular optimization is the reason for CPU vulnerabilities such as specter and meltdown

mightyWOZ
  • 7,946
  • 3
  • 29
  • 46
0

lots of code duplication

Lots of code duplication means there's lots of code. So how could one simple null check influence the performance?

Hoisting null checks out of loops is a very trivial optimization. There's no guarantee that it gets done, but when the JIT can't do it, the CPU can still perfectly predict the branch outcome.(*) So the branch will cost something like ¼ cycle as current CPU's are capable of executing say 4 instruction per cycle.

As already said, there's some branching anyway as the JIT has to do the null check. So most probably the net win from this premature optimization is zero.


(*) The prediction can get spoiled by tons of other branches evicting you branch from the predictor (sort of cache). But then your poor branch was just one of many and you needn't care.

maaartinus
  • 44,714
  • 32
  • 161
  • 320
0

I could write two versions of myAlgorithm...but that would lead to lots of code duplication"

Yes, this may be a way to optimize performance and one of rare cases when DRY doesn't work. Another example of such RY technique - the loop unrolling (if your compiler didn't do it :)). Here, the code duplication is the cost you pay for better performance.

But, as for your particular IF, you see, the condition isn't changed in the loop and CPU branch prediction should work quite well. Make good/correct performance tests (with JMH, for example) and something tells me that you will not see any difference with such pico(even not micro)-optimization, the result may be even worse, since there are much-much more important things that may affect the overall performance. Just a few of such ones:

  • the most efficient compiler optimization is inlining (https://www.baeldung.com/jvm-method-inlining). If your code transformation brakes inlining, think twice and measure the result performance properly
  • memory allocation and, therefore, GC pauses in the main/critical path of the application may also be an important thing. Reuse mutable objects if required (pooling).
  • cache misses. Make sure you access the memory sequentially as much as possible. A canonical example - you replace LinkedList by ArrayList to iterate through and your performance becomes much better
  • etc. etc.

So, don't worry about this particular IF at all.

Performance optimization is a very large and very interesting area. Take care about RIGHT things and make make make correct perf tests... And always think about appropriate algorithms/collections, remember about classical big O.

AnatolyG
  • 1,557
  • 8
  • 14