9

Does the Java compiler or the JIT compiler optimize divisions or multiplications by a constant power of two down to bitshifting?

For example, are the following two statements optimized to be the same?

int median = start + (end - start) >>> 1;
int median = start + (end - start) / 2;

(basically this question but for Java)

Community
  • 1
  • 1
wchargin
  • 15,589
  • 12
  • 71
  • 110

3 Answers3

17

While the accepted answer is right in the sense that the division can't be simply replaced by a right shift, the benchmark is terribly wrong. Any Java benchmark running for less than one second is probably measuring the interpreter's performance - not something you usually care about.

I couldn't resist and wrote an own benchmark which mainly shows that it's all more complicated. I'm not trying to fully explain the results, but I can say that

  • a general division is a damn slow operation
  • it gets avoided is much as possible
  • division by a constant gets AFAIK always somehow optimized
  • division by a power of two gets replaced by a right shift and an adjustment for negative numbers
  • a manually optimized expression might be better
Community
  • 1
  • 1
maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • Your answer says "division by a power of two gets replaced by a right shift and an adjustment for negative numbers" while accepted answer says it doesn so which one is it? – vach Nov 25 '16 at 04:19
  • @vach My benchmark clearly shows that the optimization indeed gets done (however, depending on the VM and the CPU). The benchmark from the accepted answer is completely broken due to dead code elimination, so we can forget it (read about JMH blackhole if you're unsure whom to trust). – maaartinus Nov 25 '16 at 12:54
8

No, the Java compiler doesn't do that, because it can't be sure on what the sign of (end - start) will be. Why does this matter? Bit shifts on negative integers yield a different result than an ordinary division. Here you can see a demo: this simple test:

System.out.println((-10) >> 1);  // prints -5
System.out.println((-11) >> 1);  // prints -6
System.out.println((-11) / 2);   // prints -5

Also note that I used >> instead of >>>. A >>> is an unsigned bitshift, while >> is signed.

System.out.println((-10) >>> 1); // prints 2147483643

@Mystical: I wrote a benchmark, that shows that the compiler / JVM doesn't do that optimization: https://ideone.com/aKDShA

Community
  • 1
  • 1
Martijn Courteaux
  • 67,591
  • 47
  • 198
  • 287
  • 1
    While they aren't quite the same, there's nothing to stop the compiler from doing a work-around. For example, `x / 2` is equal to `(x - (x >> 31)) >> 1`. – Mysticial Sep 01 '13 at 17:36
  • That are 3 ASM instructions versus 1. I don't think that is an optimization. Might be wrong. – Martijn Courteaux Sep 01 '13 at 17:41
  • 6
    A division is definitely slower than 3 basic instructions. Depending on what you're dividing by, they tend to be between 10 - 70 cycles. Whereas the majority of basic instructions are only 1 cycle. (not counting throughput) – Mysticial Sep 01 '13 at 17:41
  • Where can I read more about it? I'd love to know. I'll remove my answer afterwards :) – Martijn Courteaux Sep 01 '13 at 17:43
  • 3
    Here's the hard core details: http://www.agner.org/optimize/instruction_tables.pdf You don't need to remove your answer. It's correct that the compiler can't straight-out optimize it to a single shift. I'm just saying that it may do something else. – Mysticial Sep 01 '13 at 17:45
  • Thanks for that resource! I wrote a benchmark on Ideone. This benchmark shows that their compiler/VM does not the optimization: http://ideone.com/fwdAXU – Martijn Courteaux Sep 01 '13 at 17:51
  • 4
    Wow. That's a surprise! My C++ compiler will do it. I would've thought that the JIT would've been able to do "basic" optimizations like this. I say "basic", because while it isn't necessarily basic to a programmer (and something you shouldn't do manually anyway), it's considered basic for a compiler. – Mysticial Sep 01 '13 at 17:54
  • 1
    @Mystical This optimization works only for CPU's that have a barrel-shifter. IIRC the optimization would actually have been slower for the x86 P4-generation (for example). It heavily relies on shifts being independent of actual shift count, which they are not generally on all architectures. – Durandal Sep 01 '13 at 18:30
  • @Durandal: Yes, but that's the advantage of JIT: It does know what it runs on. And JVM does a lot of such optimizations (even more [that I thought](http://stackoverflow.com/questions/21567248/strange-performance-drop-of-jdk8-localdate-toepochday)), keeping the barrel shifters pretty busy. – maaartinus Feb 07 '14 at 00:37
  • @maaartinus Yes it is (I digged out some old code and retestet it) - but it didn't seem to do so in 1.7.0ea/1.6.x from two years ago, when I originally tested, when the exact same claim was also frequently made, thats why I asked so persistently. – Durandal Feb 07 '14 at 15:59
  • @Mysticial: The surprise is gone. My answer shows that the JVM *does* optimize `/2` and also `/3`. It mightn't be as good as humans doing it manually, this would require a better benchmark than mine. – maaartinus Feb 07 '14 at 17:19
  • 3
    Your test just proves that the optimizer didn’t kick in at all in that short time. Since your test code doesn’t use the result of the calculation, Hotspot will remove the calculation completely instead of replacing it with shifts or such alike (once it started its work). This in turn allows removing the loops entirely, so you will notice when the optimizer has kicked in, for sure. If you measure more than a few nanoseconds, it didn’t… – Holger Feb 19 '16 at 17:10
  • 2
    Now your benchmark shows a huge difference: 'Run 1: 295044177, Run 2: 3749731100`. However, it's terribly broken. Please replace it by JMH or caliper benchmark (s. my answer) or fix it w.r.t. dead code elimination and other gotchas. – maaartinus Nov 25 '16 at 13:00
2

If the JVM does not do it, you can easily do it yourself.

As noted, right shifts on negative numbers do not behave the same as division because the result is rounded in the wrong direction. If you know that the dividend is non-negative, you can safely replace the division by a shift. If it might be negative, you can use the following technique.

If you can express your original code in this form:

int result = x / (1 << shift);

You can replace it with this optimized code:

int result = (x + (x >> 31 >>> (32 - shift))) >> shift;

Or, alternatively:

int result = (x + ((x >> 31) & ((1 << shift) - 1))) >> shift;

These formulae compensate for the incorrect rounding by adding a small number computed from the sign bit of the dividend. This works for any x with all shift values from 1 to 30.

If the shift is 1 (i.e., you are dividing by 2) then the >> 31 can be removed in the first formula to give this very tidy snippet:

int result = (x + (x >>> 31)) >> 1;

I have found these techniques to be faster even when the shift is non-constant, but obviously they benefit the most if the shift is constant. Note: For long x instead of int, change 31 and 32 respectively to 63 and 64.

Examining the generated machine code shows that (unsurprisingly) the HotSpot Server VM can do this optimization automatically when the shift is constant, but (also unsurprisingly) the HotSpot Client VM is too stupid to.

Boann
  • 48,794
  • 16
  • 117
  • 146