0

I noticed in my benchmarks that arithmetic compound operators in Java always outperform the regular assignment:

    d0 *= d0;           //faster
    //d0 = d0 * d0;     //slower

    d0 += d0;           //faster
    //d0 = d0 + d0;     //slower

Could someone please comment on the above observation and explain why it is the case. I'm assuming that some differences on the bytecode level are responsible for the speedup? Thank you in advance.

Here's my benchmark, more fully:

public long squaring() {
    long t0 = System.currentTimeMillis();
    double d0 = 0;

    for (int k = 0; k < 100_000_000; k++){

        //check bytecode for below to see why timing differs
        d0 *= d0;           //faster
        //d0 = d0 * d0;     //slower
    }

    long t1 = System.currentTimeMillis();
    long took = (t1 - t0);
    System.out.println("took: "+took + " ms");
    System.out.println("result: " +d0);

    return took;
}

@Test
    public void testSquaring() {
        int repetitions = 10;
        long sum = 0;
        for (int i = 0; i < repetitions; i++) {
            sum += cut.squaring();
            System.out.println("accumulated: "+ sum + "\n-------------------");
        }
        double avg = sum/repetitions;
        System.out.println("average: "+avg);

    }

And here are the results:

took: 244 ms
result: 0.0
accumulated: 244
-------------------
took: 302 ms
result: 0.0
accumulated: 546
-------------------
took: 0 ms
result: 0.0
accumulated: 546
-------------------
took: 0 ms
result: 0.0
accumulated: 546
-------------------
took: 0 ms
result: 0.0
accumulated: 546
-------------------
took: 0 ms
result: 0.0
accumulated: 546
-------------------
took: 0 ms
result: 0.0
accumulated: 546
-------------------
took: 0 ms
result: 0.0
accumulated: 546
-------------------
took: 0 ms
result: 0.0
accumulated: 546
-------------------
took: 0 ms
result: 0.0
accumulated: 546
-------------------
average: 54.0
Andy Turner
  • 137,514
  • 11
  • 162
  • 243
Simeon Leyzerzon
  • 18,658
  • 9
  • 54
  • 82
  • 2
    The commented-out code performs two multiplications, not one. Is that really what you're comparing to? – John Bollinger Sep 02 '16 at 13:19
  • 1
    You could just look at the bytecode to verify your hypothesis. – Oliver Charlesworth Sep 02 '16 at 13:20
  • No, that was an error, which I edited. – Simeon Leyzerzon Sep 02 '16 at 13:20
  • 3
    That said, there's a good chance that the way you're measuring this might be misleading - see http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java. – Oliver Charlesworth Sep 02 '16 at 13:20
  • @OliverCharlesworth - did that, however I was hoping to hear some expert opinions on that assumption – Simeon Leyzerzon Sep 02 '16 at 13:22
  • 1
    Those seem to produce the exact same bytecode. I think the problem is your test. – resueman Sep 02 '16 at 13:24
  • 2
    "some differences on the bytecode level" `javap -c YourClass` shows you the bytecode. – Andy Turner Sep 02 '16 at 13:25
  • 1
    And I would *expect* them to produce the same bytecode, because Java bytecode does not have individual instructions specifically for `+=`, `*=`, *etc*.. – John Bollinger Sep 02 '16 at 13:25
  • 1
    @SimeonLeyzerzon looks like everything took 0ms once your JIT warmed up. – Andy Turner Sep 02 '16 at 13:34
  • 1
    When it says; "took: 0 ms" there is a good chance your micro-benchmark has been optimised away to nothing. You need to run the test for 2 seconds at least to get a meaningful number. – Peter Lawrey Sep 02 '16 at 13:35
  • @PeterLawrey: I noticed that optimisation in the unit test as well, but that is another observation which I attribute to the JIT's behavior (outside of scope of this question). However, simply running the loop from the public static void main of the same class (without any `@Test` in the picture) consistently produces the difference in timing. Why is that? – Simeon Leyzerzon Sep 02 '16 at 13:42
  • 1
    @SimeonLeyzerzon phase of the moon? How busy your CPU is doing other stuff? Plain chance? There's no difference in the bytecode, so there can't be an intrinsic reason for a difference in how quickly the bytecode is executed. – Andy Turner Sep 02 '16 at 13:44

2 Answers2

6

Try decompiling the two cases:

static void a(double d0) {
  d0 *= d0;
}

static void b(double d0) {
  d0 = d0 * d0;
}

Decompiles to:

  static void a(double);
    Code:
       0: dload_0
       1: dload_0
       2: dmul
       3: dstore_0
       4: return

  static void b(double);
    Code:
       0: dload_0
       1: dload_0
       2: dmul
       3: dstore_0
       4: return

i.e. they are identical at a bytecode level, and thus there can be no intrinsic performance difference. Factors outside this code are affecting the running time.

Andy Turner
  • 137,514
  • 11
  • 162
  • 243
3

a *= b; produce exactly same bytecode as a = a * b;. This mean there is a problem in your performance test.

Same applyes to +, -, / etc.

talex
  • 17,973
  • 3
  • 29
  • 66
  • I posted a more fuller tests - can you spot any problem there?: – Simeon Leyzerzon Sep 02 '16 at 13:34
  • 1
    Isn't you surprized that most of you tests print `took: 0 ms`? and only two returns something meaningfull. – talex Sep 02 '16 at 13:41
  • I think it's because the JIT optimises it after the 2 attempts? – Simeon Leyzerzon Sep 02 '16 at 13:56
  • @SimeonLeyzerzon Yes. And after this optimization no futher calls are contribute to you result. So you mesure mostly how long does it take to JIT this method, but you want to measure something else, right? – talex Sep 02 '16 at 13:59
  • 1
    @SimeonLeyzerzon Micro-benchmarking in java is hard because of many reason. I sugest you to find some framework for micro-benchmarking and use it instead of trying to handle all the complexity of it by yourself. – talex Sep 02 '16 at 14:02