I have a fairly simple hobby project written in Java 8 that makes extensive use of repeated Math.round() calls in one of its modes of operation. For example, one such mode spawns 4 threads and queues 48 runnable tasks by way of an ExecutorService, each of which runs something similar to the following block of code 2^31 times:
int3 = Math.round(float1 + float2);
int3 = Math.round(float1 * float2);
int3 = Math.round(float1 / float2);
That's not exactly how it is (there are arrays involved, and nested loops), but you get the idea. Anyway, prior to Java 8u40, the code that resembles the above could complete the full run of ~103 billion instruction blocks in about 13 seconds on an AMD A10-7700k. With Java 8u40 it takes around 260 seconds to do the same thing. No changes to code, no nothing, just a Java update.
Has anyone else noticed Math.round() getting much slower, especially when it is used repetitiously? It's almost as though the JVM was doing some sort of optimization before that it isn't doing anymore. Maybe it was using SIMD prior to 8u40 and it isn't now?
edit: I have completed my second attempt at an MVCE. You can download the first attempt here:
https://www.dropbox.com/s/rm2ftcv8y6ye1bi/MathRoundMVCE.zip?dl=0
The second attempt is below. My first attempt has been otherwise removed from this post as it has been deemed to be too long, and is prone to dead code removal optimizations by the JVM (which apparently are happening less in 8u40).
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class MathRoundMVCE
{
static long grandtotal = 0;
static long sumtotal = 0;
static float[] float4 = new float[128];
static float[] float5 = new float[128];
static int[] int6 = new int[128];
static int[] int7 = new int[128];
static int[] int8 = new int[128];
static long[] longarray = new long[480];
final static int mil = 1000000;
public static void main(String[] args)
{
initmainarrays();
OmniCode omni = new OmniCode();
grandtotal = omni.runloops() / mil;
System.out.println("Total sum of operations is " + sumtotal);
System.out.println("Total execution time is " + grandtotal + " milliseconds");
}
public static long siftarray(long[] larray)
{
long topnum = 0;
long tempnum = 0;
for (short i = 0; i < larray.length; i++)
{
tempnum = larray[i];
if (tempnum > 0)
{
topnum += tempnum;
}
}
topnum = topnum / Runtime.getRuntime().availableProcessors();
return topnum;
}
public static void initmainarrays()
{
int k = 0;
do
{
float4[k] = (float)(Math.random() * 12) + 1f;
float5[k] = (float)(Math.random() * 12) + 1f;
int6[k] = 0;
k++;
}
while (k < 128);
}
}
class OmniCode extends Thread
{
volatile long totaltime = 0;
final int standard = 16777216;
final int warmup = 200000;
byte threads = 0;
public long runloops()
{
this.setPriority(MIN_PRIORITY);
threads = (byte)Runtime.getRuntime().availableProcessors();
ExecutorService executor = Executors.newFixedThreadPool(threads);
for (short j = 0; j < 48; j++)
{
executor.execute(new RoundFloatToIntAlternate(warmup, (byte)j));
}
executor.shutdown();
while (!executor.isTerminated())
{
try
{
Thread.sleep(100);
}
catch (InterruptedException e)
{
//Do nothing
}
}
executor = Executors.newFixedThreadPool(threads);
for (short j = 0; j < 48; j++)
{
executor.execute(new RoundFloatToIntAlternate(standard, (byte)j));
}
executor.shutdown();
while (!executor.isTerminated())
{
try
{
Thread.sleep(100);
}
catch (InterruptedException e)
{
//Do nothing
}
}
totaltime = MathRoundMVCE.siftarray(MathRoundMVCE.longarray);
executor = null;
Runtime.getRuntime().gc();
return totaltime;
}
}
class RoundFloatToIntAlternate extends Thread
{
int i = 0;
int j = 0;
int int3 = 0;
int iterations = 0;
byte thread = 0;
public RoundFloatToIntAlternate(int cycles, byte threadnumber)
{
iterations = cycles;
thread = threadnumber;
}
public void run()
{
this.setPriority(9);
MathRoundMVCE.longarray[this.thread] = 0;
mainloop();
blankloop();
}
public void blankloop()
{
j = 0;
long timer = 0;
long totaltimer = 0;
do
{
timer = System.nanoTime();
i = 0;
do
{
i++;
}
while (i < 128);
totaltimer += System.nanoTime() - timer;
j++;
}
while (j < iterations);
MathRoundMVCE.longarray[this.thread] -= totaltimer;
}
public void mainloop()
{
j = 0;
long timer = 0;
long totaltimer = 0;
long localsum = 0;
int[] int6 = new int[128];
int[] int7 = new int[128];
int[] int8 = new int[128];
do
{
timer = System.nanoTime();
i = 0;
do
{
int6[i] = Math.round(MathRoundMVCE.float4[i] + MathRoundMVCE.float5[i]);
int7[i] = Math.round(MathRoundMVCE.float4[i] * MathRoundMVCE.float5[i]);
int8[i] = Math.round(MathRoundMVCE.float4[i] / MathRoundMVCE.float5[i]);
i++;
}
while (i < 128);
totaltimer += System.nanoTime() - timer;
for(short z = 0; z < 128; z++)
{
localsum += int6[z] + int7[z] + int8[z];
}
j++;
}
while (j < iterations);
MathRoundMVCE.longarray[this.thread] += totaltimer;
MathRoundMVCE.sumtotal = localsum;
}
}
Long story short, this code performed about the same in 8u25 as in 8u40. As you can see, I am now recording the results of all calculations into arrays, and then summing those arrays outside of the timed portion of the loop to a local variable which is then written to a static variable at the end of the outer loop.
Under 8u25: Total execution time is 261545 milliseconds
Under 8u40: Total execution time is 266890 milliseconds
Test conditions were the same as before. So, it would appear that 8u25 and 8u31 were doing dead code removal that 8u40 stopped doing, causing the code to "slow down" in 8u40. That doesn't explain every weird little thing that's cropped up but that appears to be the bulk of it. As an added bonus, the suggestions and answers provided here have given me inspiration to improve the other parts of my hobby project, for which I am quite grateful. Thank you all for that!