Java Multiplication Optimization

Question

I am currently working on code that will have hundreds of thousands of iterations and want to know if modern Java compilers automatically handle intermediate values during optimization into assembly. For instance, I have the following code in the loop (simplified):

arrayA[i] += doubleA*doubleB;
arrayB[i] += doubleA*doubleB;

Is a modern Java compiler 'intelligent' enough to store doubleA*doubleB into a multiplication register (and then proceed to read from the multiplication register for the second array, avoiding a second floating point operation)? Or, would I be better off with the following:

double product = doubleA*doubleB;
arrayA[i] += product;
arrayB[i] += product;

For the second option, I would primarily be concerned about the overhead of Java's garbage collector dealing with the product variable every single time it goes out of scope.

You'd *hope* any decent JIT compiler would manage to do that CSE. Also that it wouldn't dynamically allocate any space for a local temporary that never has a reference taken to it. (For x86, it would keep it in a register, or in stack memory if it had to spill/reload it, not on the heap at all). But you can't be *sure* unless you look at the asm, or do some basic profiling to see if there's any GC work if you write it the DRY way (don't repeat yourself) which more closely represents the machine code you ultimately want to execute. — Peter Cordes, Apr 18 '18 at 23:22
Of course, if you *really* care about performance, I'm not sure if current JVM JIT compilers know how to auto-vectorize with SSE2 or AVX, to do `arrayA[i+0..3] += product` in a single instruction with 32-byte loads/stores and SIMD [`vaddpd`](http://felixcloutier.com/x86/ADDPD.html) to do four packed `double` adds as fast as the CPU can do one. — Peter Cordes, Apr 18 '18 at 23:24
@Bubletan: because `double` is a primitive type, not an `Object` at all, right? So there's no chance of the JVM putting it on the heap. — Peter Cordes, Apr 18 '18 at 23:25
@PeterCordes As long as it's a local variable, yes. A field is of course stored in heap with the object. — Bubletan, Apr 18 '18 at 23:31
To summarize, just write it whichever way like. Do it the first way and any decent compiler ensure that `doubleA*doubleB` is only evaluated once. If you don't trust the compiler (or just think it looks better) write it the second way without fear of garbage collection overhead because there won't every be any. — Kevin Anderson, Apr 18 '18 at 23:31

score 2 · Answer 1 · answered Apr 19 '18 at 04:06

If you are running the code millions of times it is highly probable that the code will be JIT compiled. If you want to see the JIT output, and verify that it is being natively compiled you can enable that with a JVM flag (you will also have to compile a library beforehand (the library doesn't come pre-packaged due to licensing issues)).

When the JIT compiles code into native machine code it will usually perform optimizations on the code. There is also a flag which optimizes it more and more over time as the usages go up. It should be noted that JIT compilation won't usually occur until the function has been executed around 10,000 times, unfortunately there is no way to force the JIT to compile code at program launch. Presumably the JIT shouldn't have any overhead, it will probably compile the code in the background on another thread, and then inject the native code when it is finished (JIT compilation should still only take less than half a second).

As for the storing the result into a double, that won't have any negative performance impact. Also you don't need to worry about the GC for that, since it is a primitive type it is declared on the stack and popped off after the scope exits (the variable will be re-declared in the next loop iteration).

*should still only take less than half a second* but in that time, an ahead-of-time optimized loop could easily have done about 1 billion loop iterations on a 4GHz CPU even without SIMD, or with memory bottlenecks stopping it from sustaining 16 bytes load/stored per clock :P And BTW, the JITed version won't actually adjust the CPU stack pointer register up and back down every loop iteration; that would be silly. Logically the variable goes in and out of scope, but in the asm it probably only lives in a register, or leaves stack memory allocated for the whole function. — Peter Cordes, Apr 19 '18 at 04:55

score 0 · Answer 2 · answered Apr 19 '18 at 01:51

0

You'll practically never know what a jit does, but you can easily look at the bytecode with javap. If the javac/ide didn't optimize it, I won't presume the jit will. Just write good code, it easier on the eyes anyway.

answered Apr 19 '18 at 01:51

user2023577

1,752
1
12
23

2

[How to see JIT-compiled code in JVM?](//stackoverflow.com/q/1503479) shows how to see the actual machine code / asm produced by Sun/Oracle JVM with the HotSpot JIT. It has options built-in to make this not too difficult. But yes, in this case you can safely write good code without having to fight against the language for good performance. – Peter Cordes Apr 19 '18 at 04:01
you can take any jar in a prod system (or any copy of that build) and use javap offline, but you cannot practically have this hsdis plugin and jvm options and capture the desired asm code in prod (especially with de-jit and re-jit events), it unlikely. But here to perform such local check, that a great idea! THANKS! – user2023577 Apr 19 '18 at 11:50

Java Multiplication Optimization

2 Answers2