0

I've got a little program that is a fairly pointless exercise in simple number crunching that has thrown me for a loop.

The program spawns a bunch of worker threads that do simple mathematical operations. Recently I changed the inner loop of one variant of worker from:

do
{           
    int3 = int1 + int2;
    int3 = int1 * int2;             
    int1++;
    int2++;
    i++;
}
while (i < 128);

to something akin to:

int3 = tempint4[0] + tempint5[0];
int3 = tempint4[0] * tempint5[0];

int3 = tempint4[1] + tempint5[1];
int3 = tempint4[1] * tempint5[1];

int3 = tempint4[2] + tempint5[2];
int3 = tempint4[2] * tempint5[2];

int3 = tempint4[3] + tempint5[3];
int3 = tempint4[3] * tempint5[3];

...

int3 = tempint4[127] + tempint5[127];
int3 = tempint4[127] * tempint5[127];

The arrays are populated by random integers no higher than 1025 in value, and the array values do not change.

The end result was that the program ran much faster, though closer examination seems to indicate that the CPU isn't actually doing anything when running the newer version of the code. It seems that the JVM has figured out that it can safely ignore the code that replaced the inner loop after one iteration of the outer loop since it is only redoing the same calculations on the same set of data over and over again.

To illustrate my point, the old code took maybe ~27000 ms to run and noticeably increased the operating temperature of the CPU (it also showed 100% utilization for all cores). The new code takes maybe 5 ms to run (sometimes less) and causes nary a spike in CPU utilization or temperature. Increasing the number of outer loop iterations does nothing to change the behavior of the new code, even when the number of iterations increases by a hundred times or more.

I have another version of the worker that is identical to the one above except that it has a division operation along with the addition and multiplication operations. In its new unrolled form, the division-enabled version is also much faster than it's previous form, but it actually takes a little while (~300 ms on the first run and ~200 ms on subsequent runs, despite warmup, which is a little odd) and produces a profound spike in CPU temperature for its brief run. Increasing the number of outer loop iterations seems to cause the temperature phenomenon to mostly cease after a certain amount of time has passed while running the program, though utilization still shows 100% for all cores. My guess is the JVM is taking much longer to figure out which operations it can safely ignore when handling division operations, and that it is not ignoring all of them.

Short of adding division operations to all my code (which isn't really a fix anyway beyond a certain number of outer loop iterations), is there any way I can get the JVM to stop reducing my code to apparent NOOPs? I've tried several solutions to the problem, such as generating new random values per iteration of the outer loop, going back to simple integer variables with incrementation, and some other nonsense, but none of those solutions have produced desirable results. Either it continues to ignore the series of instructions, or the performance hit from modifications is bad enough that my division-heavy variant actually performs better than the code without division operations.

edit: to provide some context:

i: this variable is an integer that is used for a loop counter in a do/while loop. It is defined in the class file containing the worker code. It's initial value is 0. It is no longer used in the newer version of the worker.

int1/int2: These are integers defined in the class file containing the worker code. Their initial values are both 0. They were used in the old version of the code to provide changing values for each iteration of the internal loop. All I had to do was increment them upward by one per loop iteration, and the JVM would be forced to carry out every operation faithfully. Unfortunately, this loop apparently prevented the use of SIMD. Each time the outer loop iterated, int1 and int2 had their values reset to prevent overflow of int1, int2, or int3 (I have discovered that integer overflow can slow down the code unnecessarily, as can allowing a float to reach Infinity).

tempint4/tempint5: These are references to a pair of integer arrays defined in the main class file for the program (Mathtester. Yes, unimaginative, I know). When the program first starts, there is a short do/while loop that fills each array with random integers randing from 1-1025. The arrays are 128 integers in size. Each array is static, though the reference variables are not. In truth there is no particular reason for me to use the reference variables. They are leftovers from when I was trying to do an array reference swap so that, after each iteration of the outer loop, tempint4 and tempint5 would be referred to the opposite array. It was my hope that the JVM would stop ignoring my code block. For the division-enabled version of the code, this seems to have worked (sort of), since it fundamentally changes the values to be calculated. Swapping tempint4 for tempint5 and vice versa does not change the results of the addition and multiplication operations, so the JVM can still ignore those.

edit: Making tempint4 and tempint5 (since they are only reference variables, I am actually referring to the main arrays, Mathtester.int4 and Mathtester.int5) volatile worked without notably reducing the amount of CPU activity or level or CPU temperature. It did slow down the code a bit, but that is a probable indicator that the JVM was NOOPing more than I knew.

user3765373
  • 353
  • 1
  • 2
  • 9
  • We need a bit a more context here: what is ``i``, ``intx``, ``tempintx`` and how are they used? – Jean Logeart Dec 26 '14 at 14:56
  • If you're doing this because you want to do a benchmark, then you have to be aware that benchmarking has lots of caveats in Java and that it is not easy to write a correct benchmark, see [How do I write a correct micro-benchmark in Java?](http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) – Jesper Dec 28 '14 at 15:47
  • Two points: One, this can not be considered a proper benchmark since it does not provide any meaningful data, aside perhaps from how long it takes a particular machine with a particular jvm with a particular operating system to complete some rudimentary mathematical operations. The program is nothing compared to, say, y-cruncher or Wprime or what have you. Mr. Logeart, I'll do my best to provide additional context by way of an edit. It was my intention not to louse up my question with overmuch code. – user3765373 Dec 29 '14 at 14:09

2 Answers2

1

Is there any way I can get the JVM to stop reducing my code to apparent NOOPs?

Yes, by making int3 volatile.

Jean Logeart
  • 52,687
  • 11
  • 83
  • 118
  • Downvote - giving "magic" advice is dangerous if the person receiving it doesn't understand the context the advice works in. – kittylyst Dec 28 '14 at 15:45
  • FWIW it worked. I have used volatile variables in this program before to prevent unintended execution order of code in situations where multiple threads were working concurrently, but I have never used it to prevent the JVM from ignoring operations completely. Unfortunately it seems to have had an ugly side-effect in that the CPU isn't working as hard as before when treating int3 as volatile. When applying a volatile prefix to the in3 variable in the division-enabled version of the program, the temps are lower and the code is much slower. I think I can guess why, too . . . – user3765373 Dec 29 '14 at 14:17
  • Making int3 volatile was a mistake, but making tempint4 and tempint5 volatile worked beautifully. So the lesson I'm taking away here is that anything volatile will prevent the JVM from NOOPing that bit of code, but if you actually change the value of a volatile variable, it's going to change how concurrent threads behave. In the case of my code, making int3 volatile made everything very, very slow. – user3765373 Dec 30 '14 at 13:17
1

One of the first things when dealing with Java performance that you have to learn by heart is this:

"A single line of Java code means nothing at all in isolation".

Modern JVMs are very complex beasts, and do all kinds of optimization. If you try to measure some small piece of code, the chances are that you will not be measuring what you think you are - it is really complicated to do it correctly without very, very detailed knowledge of what the JVM is doing.

In this case, yes, it's entirely likely that the JVM is optimizing away the loop. There's no simple way to prevent it from doing this, and almost all techniques are fragile and JVM-version specific (because new & cleverer optimizations are developed & added to the JVM all the time).

So, let me turn the question around: "What are you really trying to achieve here? Why do you want to prevent the JVM from optimizing?"

kittylyst
  • 5,640
  • 2
  • 23
  • 36
  • Yeah, I figure future versions of the JVM will find ways to optimize out my silly code. Anyway, to answer your question, the goal is to produce a simple program that can show how long it takes a particular computer to handle some basic arithmetic using a particular JVM and OS (I have not seem it working on a different JVM, but OS doesn't seem to make a big difference, at least not that I can tell. Hardware certainly does). It is an exercise in nothing much at all. You and Jesper are correct that it is difficult to measure a small piece of code. I have looked up tips on microbenches. – user3765373 Dec 29 '14 at 14:23
  • My reason for wanting the JVM to not optimize in this particular fashion is that the code snippet I provided is all the program is really doing. It's designed to run that sequence of additions and multiplications, record the amount of time it took to complete the operations, and spit the information back out to the user. It works just fine with the older worker version, but, you know, I wanted to make it run faster, make it heat up the CPU a bit more . . . and the JVM rained on that parade. At least the division-enabled code still works for several iterations. – user3765373 Dec 29 '14 at 14:25