5

I've seen that JITC uses unsigned comparison for checking array bounds (the test 0 <= x < LIMIT is equivalent to 0 ≺ LIMIT where treats the numbers as unsigned quantities). So I was curious if it works for arbitrary comparisons of the form 0 <= x < LIMIT as well.

The results of my benchmark are pretty confusing. I've created three experiments of the form

for (int i=0; i<LENGTH; ++i) {
    int x = data[i];
    if (condition) result += x;
}

with different conditions

  • 0 <= x called above
  • x < LIMIT called below
  • 0 <= x && x < LIMIT called inRange
  • 0 <= x & x < LIMIT called inRange2

and prepared the data so that the probabilities of the condition being true are the same.

The results should be fairly similar, just above might be slightly faster as it compares against zero. Even if the JITC couldn't use the unsigned comparison for the test, the results for above and below should still be similar.

Can anyone explain what's going on here? It's quite possible that I did something wrong...

Update

I'm using Java build 1.7.0_51-b13 on Ubuntu 2.6.32-54-generic with i5-2400 CPU @ 3.10GHz, in case anybody cares. As the results for inRange and inRange2 near 0.00 are especially confusing, I re-ran the benchmark with more steps in this area.

Community
  • 1
  • 1
maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • 3
    Outside of the fact that Java is notoriously hard to get meaningful micro-benchmarks out of, even after you've allowed for full JITting and hotspotting warmup before testing and a long enough test run to get meaningful data out of ... and allowed for the fact that the context in which these are used may affect how those optimizations behave so the micro-benchmark is unlikely to actually be meaningful... and allowed for the fact that different JREs may optimize differently... – keshlam Feb 19 '14 at 05:57
  • Maybe try to look at the [generated assembly code](http://java.dzone.com/articles/printing-generated-assembly) and see what it's actually doing... that is, assuming you're running the benchmark long enough to trigger compilation. – vanza Mar 20 '14 at 19:37

2 Answers2

0

The likely variation in the results of the benchmarks have to do with CPU caching at different levels.

Since primitive int(s) are being used, there is no JVM specific caching going on, as will happen with auto-boxed Integer to primitive values.

Thus all that remains, given minimal memory consumption of the data[] array, is CPU-caching of low level values/operations. Since as described the distributions of values are based on random values with statistical 'probabilities' of the conditions being true across the tests, the likely cause is that, depending on the values, more or less (random) caching is going on for each test, leading to more randomness.

Further, depending on the isolation of the computer (background services/processes), the test cases may not be running in complete isolation. Ensure that everything is shutdown for these tests except the core OS functions and the JVM. Set the JVM memory min/max the same, shutdown any networking processes, updates, etc.

Darrell Teague
  • 4,132
  • 1
  • 26
  • 38
  • There can't be any caching issues. The array needs 4kB and so surely fits into L1 and so does the code. Other processing might have disturbed the measurement a bit, but I'm not seeking an explanation for random some 1% variance but for *repeatable factor 2 or 3*. **TL;DR That's not it.** – maaartinus Mar 20 '14 at 23:19
  • The original question was to provide possible explanations to the observed behavior. Since there is only a code snippet, the environment and test-conditions have not been cited... all anyone can offer are theories on what MIGHT be causing the observed behavior. To cite a real-world case, I was once asked to performance optimize a multi-threaded application. My first question was how many processors were available. The original request was abandoned when the OP realized that they were running the application on a fraction of a single processor VM. – Darrell Teague Mar 22 '14 at 18:47
  • I've added the needed info (CPU, OS, etc.) some long time ago. If you feel something's still missing I'll add it. This is one of many benchmarks I've posted here and there's no gotcha like in the case you cited. No lack of memory, nothing like this. Just the JIT doing inexplicable things (and inspecting the assembly didn't lead me to anything meaningful). – maaartinus Mar 22 '14 at 19:13
0

Are you test results the avarage of a number of runs, or did you only test each function once?

One thing I have found are that the first time you run a for loop the JVM will interpret, then each time its run the JVM will optimize it. Therefore the first few runs may get horrible performance, but after a few runs it will be near native performance.

I also figured out that a loop will not be optimized while its running. I have not tested if this applies to just the loop or the whole function. If it only applies to the loop you may get much more performance if you nest in in an inner and outer loop, and work with your data one block at a time. If its the whole function, you will have to place the inner loop in its own function.

Also run the test more than once, if you compare the code you will notice how the JIT optimizes the code in stages.

For most code this gives Java optimal performance. It allows it to skip costly optimization on code that runs rarely and makes code that run often a lot faster. However if you have a code block that runs once but for a long time, it will become horribly slow.

user1657170
  • 318
  • 2
  • 7
  • "only test each function once?" As you can see, I used Caliper and it takes care of it all, including multiple runs (in the results linked above you can see a tiny bar denoting the varying results). It also does all the warmup and makes sure the code gets compiled before measurement. **TL;DR: Everything has taken care of.** – maaartinus Mar 20 '14 at 23:12