What is the loop inversion technique?

Question

I was going through a document which talks about just-in-time compiler (JIT) optimization techniques for Java. One of them was "loop inversion". And the document says:

You replace a regular while loop with a do-while loop. And the do-while loop is set within an if clause. This replacement leads to two fewer jumps.

How does loop inversion work and how does it optimize our code path?

N.B.: It would be great if somebody can explain with an example of Java code and how JIT optimizes it to native code and why is it optimal in modern processors.

It's not something you would do to your source code. It happens at the native code level. — Marko Topolnik, Dec 29 '13 at 15:30
@MarkoTopolnik i know. But i want to know how JIT does this in native code level. Thanks. — Trying, Dec 29 '13 at 15:31
oh cool, there is a wikipedia page about this with plenty of examples http://en.wikipedia.org/wiki/Loop_inversion . The C example is just as valid in Java. — Benjamin Gruenbaum, Dec 29 '13 at 15:35
Sometime ago inspired by one of the questions on SO I have conducted a short research in this matter, maybe the results will be helpful to you: http://stackoverflow.com/questions/16205843/java-loop-efficiency/16463563#16463563 — Adam Siemion, Dec 29 '13 at 15:35
Is this is the same thing as where the loop condition is usually put at the end (regardless of whether there will be fewer jumps performed), just so that there are fewer jump instructions (1 vs 2 per iteration)? — extremeaxe5, Apr 13 '19 at 19:28

score 109 · Accepted Answer · edited Jun 20 '20 at 09:12

109

while (condition) { 
  ... 
}

Workflow:

check condition;
if false, jump to outside of loop;
run one iteration;
jump to top.

if (condition) do {
  ...
} while (condition);

Workflow:

check condition;
if false, jump to beyond the loop;
run one iteration;
check condition;
if true, jump to step 3.

Comparing these two you can easily see that the latter may not do any jumps at all, provided that there is exactly one step through the loop, and generally the number of jumps will be one less than the number of iterations. The former will have to jump back to check the condition, only to jump out of the loop when the condition is false.

Jumps on modern pipelined CPU architectures can be quite expensive: as the CPU is finishing the execution of the checks before the jump, the instructions beyond that jump are already in the middle of the pipeline. All this processing must be discarded if the branch prediction fails. Further execution is delayed while the pipeline is being reprimed.

Explaining the mentioned branch prediction: for each kind of conditional jump the CPU has two instructions, each including a bet on the outcome. For example, you would put an instruction saying "jump if not zero, betting on not zero" at the end of a loop because the jump will have to be made on all iterations except the last one. That way the CPU starts pumping its pipeline with the instructions following the jump target instead of those following the jump instruction itself.

Important note

Please do not take this as an example of how to optimize at the source code level. That would be entirely misguided since, as already clear from your question, the transformation from the first form into the second is something the JIT compiler does as a matter of routine, completely on its own.

edited Jun 20 '20 at 09:12

Community

1
1

answered Dec 29 '13 at 15:36

Marko Topolnik

195,646
29
319
436

51

That note at the end is a very, very important thing indeed. – T.J. Crowder Dec 29 '13 at 15:41
@MarkoTopolnik Are you sure the generated bytecode for the do while loop will contain exactly the jumps you have described? – Adam Siemion Dec 29 '13 at 15:50
2

@AdamSiemion: The bytecode generated for the given `do-while` source code is irrelevant, because we don't actually write that. We write the `while` loop and let the compiler and JIT conspire to improve it for us (via loop inversion) if/as necessary. – T.J. Crowder Dec 29 '13 at 15:53
1

@T.J.Crowder +1 for the above, plus the note to Adam: *never* consider bytecode when thinking about JIT compiler's optimizations. Bytecode is much closer to Java source code than to the actual JIT-compiled code being executed. In fact, the trend in modern languages is not to have bytecode at all as a part of the language spcefication. – Marko Topolnik Dec 29 '13 at 15:56
1

Would have been extra informative the **Important note** was explained a bit more. Why would it be *entirely* misguided? – arsaKasra Dec 29 '13 at 18:31
@arsaKasra That's implied by the question itself: what we are describing here is a *transformation performed by the JIT-compiler* and I think it is clear why anyone would prefer *not* to see the second form in the code. – Marko Topolnik Dec 29 '13 at 18:50
@MarkoTopolnik It's just that the way you put it, and the way T.J. emphasized it have made it look a potentially very bad thing. I couldn't see what would be technically wrong with that, let lone that I don't seem to be insightful enough to not prefer it to the first form. – arsaKasra Dec 29 '13 at 19:57
2

@arsaKasra It's misguided because in general readability and stability trump optimizations in source code. Especially with the revelation that the JIT does this for you, you should not be attempting the (very micro) optimization on your own. – Radiodef Dec 29 '13 at 22:04
I can't see any difference between the two versions. In both cases, exactly N+1 conditional branches are executed when the loop is taken N times (N >= 0). Unconditional branches should not be counted, as they can't be mispredicted. What am I missing ? – Jan 02 '14 at 17:21
You are missing the number of actual *jumps* taken on the machine code level. Admittedly, the source code doesn't specify anything at that level, but you can imagine what a "naïve" version could look like. – Marko Topolnik Jan 02 '14 at 17:25

Keppil · Answer 2 · 2013-12-29T15:45:10.243

24

This can optimize a loop that is always executed at least once.

A regular while loop will then always jump back to the start at least once and jump to the end once at the end. An example of a simple loop running once:

int i = 0;
while (i++ < 1) {
    //do something
}

A do-while loop on the other hand will skip the first and last jump. Here is an equivalent loop to the one above, that will run without jumps:

int i = 0;
if (i++ < 1) {
    do {
        //do something
    } while (i++ < 1); 
}

edited Dec 29 '13 at 15:45

answered Dec 29 '13 at 15:31

Keppil

45,603
8
97
119

+1 for being correct and first, please consider adding a code example. Something like `boolean b = true; while(b){ b = maybeTrue();}` to `boolean b;do{ b = maybeTrue();}while(b);` should suffice. – Benjamin Gruenbaum Dec 29 '13 at 15:32
No worries. It kind of invalidates the opening line of the answer, fwiw. :-) – T.J. Crowder Dec 29 '13 at 15:46
@T.J. Well, it still won't optimize a loop that isn't entered, there will be one jump in both cases. – Keppil Dec 29 '13 at 15:48
Ah, yes. Sorry, I was reading that to mean that you *couldn't* apply it to loops that didn't loop at least once (rather than that it doesn't help them). With you now. :-) – T.J. Crowder Dec 29 '13 at 15:51
@Keppil You should probably make explicit that in the case where we have a large number of iterations X, then we will only save a single jump among the X ones. – Manuel Selva Oct 26 '16 at 09:18

score 4 · Answer 3 · answered Dec 29 '13 at 15:38

Let's walk through them:

The `while` version:

void foo(int n) {
    while (n < 10) {
       use(n);
       ++n;
    }
    done();
}

First we test n and jump to done(); if the condition is not true.
Then we use and increment n.
Now we jump back to the condition.
Rinse, repeat.
When the condition is no longer true, we jump to done().

The `do-while` version:

(Remember, we don't actually do this in the source code [that would introduce maintenance issues], the compiler/JIT does it for us.)

void foo(int n) {
    if (n < 10) {
        do {
            use(n);
            ++n;
        }
        while (n < 10);
    }
    done();
}

First we test n and jump to done(); if the condition is not true.
Then we use and increment n.
Now we test the condition and jump back if it's true.
Rinse, repeat.
When the condition is no longer true, we flow (not jump) to done().

So for instance, if n starts out being 9, we never jump at all in the do-while version, whereas in the while version we have to jump back to the beginning, do the test, and then jump back to the end when we see it isn't true.

score 3 · Answer 4 · answered Dec 29 '13 at 15:44

Loop inversion is a performance optimization technique which improves performance as the processor can accomplish the same result with fewer instructions. This should mostly improve the performance in boundary conditions.

This link provides another example for loop inversion. In few architectures where decrement and compare is implemented as a single instruction set, It makes sense to convert a for loop to a while with decrement and compare operation.

Wikipedia has a very good example and I am explaining it here again.

 int i, a[100];
  i = 0;
  while (i < 100) {
    a[i] = 0;
    i++;
  }

will be converted by the compiler to

  int i, a[100];
  i = 0;
  if (i < 100) {
    do {
      a[i] = 0;
      i++;
    } while (i < 100);
  }

How this translates to performance? When the value of i is 99, the processor need not perform a GOTO (which is required in the first case). This improves performance.

What is the loop inversion technique?

4 Answers4

Important note

The while version:

The do-while version:

The `while` version:

The `do-while` version: