1

Ran the following Java snippet which showed a big performance difference. Java 11.0.9.1 on Ubuntu 18.04.

With condition, like c = (res[i][j] >= 64)? 1 : 0;, I got time in ms: 1216

With no condition, like c = ((((res[i][j] - 64) & 0x80000000)>>31)+1); I got time in ms: 438

Questions:

  1. why is there such a big difference in performance
  2. Is c = ((((res[i][j] - 64) & 0x80000000)>>31)+1); the best option?

Thanks in advance.

public class FastCompare {
    static final int ITEMS = 1000000;
    static final int ATTRS = 1000;
    public static void main(String args[]) {
        byte[][] res = new byte[ITEMS][ATTRS];
        int i;
        int j;
        for (i=0; i<ITEMS; i++) {
            for (j=0; j<ATTRS; j++) {
                res[i][j] = (byte)(i+j);
            }
        }
        long start = System.currentTimeMillis();
        long a = 1;
        int c;
        for (i = 0; i < ITEMS; i++) {
            for (j=0; j<ATTRS; j++) {
                c = (res[i][j] >= 64)? 1 : 0;
                //c = ((((res[i][j] - 64) & 0x80000000)>>31)+1);
                a += c;
            }
        }
        System.out.println("time in ms: " + (System.currentTimeMillis() - start));
        System.out.println("a="+a);
    }
};
akuzminykh
  • 4,522
  • 4
  • 15
  • 36
pktCoder
  • 1,105
  • 2
  • 15
  • 32
  • If your posted code is correct, you are executing half of the work needed for the 'shift' case (the computation of `b`) even though you don't use it for the 'conditional' case. i.e., your benchmark is skewed. – a guest Dec 19 '20 at 18:47
  • @aguest Thanks for pointing out the inconsistency. I made a change and now it's more apple-apple comparison. The result still stands. – pktCoder Dec 19 '20 at 19:04
  • On my machine both versions take the exact same time. I'm using Java 11.0.6, Windows 10, i7-7700. Maybe it's something like [this](https://stackoverflow.com/q/11227809/12323248) but that is just a random guess. – akuzminykh Dec 19 '20 at 19:50
  • @akuzminykh, my Java version is 11.0.9.1. Curious what the time it took on your CPU? I know the result will vary depending on the CPU type, just want to have an idea. Thanks. – pktCoder Dec 19 '20 at 20:02
  • @pktCoder Both ~ 450 ms. I have a tip for you: Write the same thing in C and post the same question but for C if you observe the same problem. The C people will know better what's going on. I'm not saying you won't get an answer here; it's just less likely. – akuzminykh Dec 19 '20 at 20:06
  • Thanks @akuzminykh for the link and for the tip! I am going study it carefully before posting. – pktCoder Dec 19 '20 at 20:08
  • 3
    https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java/513259 – daniu Dec 19 '20 at 20:21
  • 1
    The `… & 0x80000000` in your expression is pointless, as you’re masking out bits that get shifted out anyway. But you can simplify the entire expression to `c = 63 - res[i][j] >>> 31;` anyway. It’s not surprising that eliminating conditionals can lead to higher performance (as long as the resulting expression is not too complicated). – Holger Dec 23 '20 at 12:34
  • 2
    But mind that the optimized variant does not evaluate to the same result as `(res[i][j] >= 64)? 1 : 0` for input values in the range `-2147483648` to `-2147483585`. – Holger Dec 23 '20 at 12:50

2 Answers2

0

Use JMH for perfomance testing. Don't trust naive currentTimeMillis() measures.

user882813
  • 813
  • 5
  • 16
0

I got the same result as you (though I used System.nanotime() instead as that's more reliable). It's probably because branching can cause problems with modern processor pipelining. See https://software.intel.com/content/www/us/en/develop/articles/branch-and-loop-reorganization-to-prevent-mispredicts.html

k314159
  • 5,051
  • 10
  • 32