I'm trying to understand what sort of compile-time optimizations I can hope for in Java when a code is executed many times. I'm in particular interested in arithmetic simplifications in the following scenario:
Imagine that you need to transform a million 3d points using the same 3d affine transformation. If the transformation turns out to be a pure translation, a good optimizer would be able to turn 12 multiplications and 12 additions into 3 additions only, because all multiplications are multiplications by 1 and many additions are additions with zeros.
Before trying this 'complex scenario', I just ran some multiplications and additions in loops, and despite the cool stuff I've been reading about Java's JIT compiler(s), I've been a bit disappointed.
First - what works: performing many multiplications by 1 seems to be simplified and is executed very fast:
tic();
final int nRepetitions = 100_000_000;
final double factor1 = 1.0d;
value = 0.0d;
for (int i=0;i<nRepetitions;i++) {
for (int j=0;j<20;j++) {
value = value * factor1;
}
}
System.out.println("Result = "+value);
toc();
I get this execution speed, which is fast:
Result with graalvm-ce-17\bin\java.exe
----------------------
Repeating 100000000 multiplication by factor1 = 1.0, a final variable
The code is put in the main method
This is overall is No op, and should be super fast
Result = 0.0
Elapsed time 64.8528 ms
----------------------
If I perform the same computation, but by calling a function, I do not get any optimization at all, and the execution speed is about 2 seconds.
tic();
value = 0.0d;
repeatMultiply(nRepetitions, value, factor1);
toc();
The function being:
public static void repeatMultiply(int nRepetitions, double value, final double multFactor) {
for (int i=0;i<nRepetitions;i++) {
for (int j=0;j<20;j++) {
value = value * multFactor;
}
}
System.out.println("Result = "+value);
}
Now the code runs very slowly:
----------------------
Repeating 100000000 multiplication by factor1 = 1.0d
Function called = repeatMultiply
This is overall is No op, and should be super fast
Result = 0.0
Elapsed time 1815.1354 ms
I've tested other things. Not declaring the variable factor1
as final
in the first example destroys the only optimization I saw. I then tried to add zeros instead of multiplying by ones, but it was even worse: I always get the 'long' execution time which is about two seconds on my machine.
I've tested Oracle JDK v1.8 and v18, also Graal VM Community Edition 17, but none seemed to make a bit of difference. I've put a gist with all the code and tests I performed in https://gist.github.com/NicoKiaru/2949e6969087e75b07b21596d80c7882
I hope you can enlighten me and let me know if these results reflect an intrinsic limitation of Java's JIT compilers of if I've done something wrong in my testing.
Is there any way my goal (auto-optimisation of affine transformations computation) can be reached with JIT compilation, or should I stop dreaming and explicitely test the "simple scenario" (like translation) explicitely in the Java code ?