8

I've recently started benchmarking some Java code in order to get the best performance results for my program and noticed some strange thing. Namely, I've benchmarked the following methods:

private static final int n = 10000;

public static void test0(){
    int m = 0;

    for(int i = 0; i < n; ++i){
        m = Math.max(i, m);
    }
}

public static void test1(){
    int m = 0;

    for(int i = 0; i < n; ++i){
        m = ((i >= m) ? i : m);
    }
}

and got those results:

          | Test 0          | Test 1          | 
----------+-----------------+-----------------+-
Average:  | 51,77 ns        | 13956,63 ns     | 
Best:     | 0,00 ns         | 6514,00 ns      | 
Worst:    | 25,45 ms        | 60,50 ms        | 
Tries:    | 16971233        | 16971233        | 

After searching over SO (i.e. Is Math.max(a,b) or (a>b)?a:b faster in Java?) it was certain for me that test1 shouldn't be so much slower.

These methods were tested randomly on 8 threads in 30 seconds and every benchmark I run seems similar. I use jdk1.8.0_45.

So, why is test1 over 200 times slower than test0?

Community
  • 1
  • 1
Kamil Jarosz
  • 2,168
  • 2
  • 19
  • 30
  • 4
    Try adding some side effects to be sure the execution is not just wiped out by the compiler figuring out the code does nothing at all: ``sum += Math.max(i, m); ... System.out.println(sum);`` – Jean Logeart Jul 19 '16 at 19:38
  • 1
    Use JMH to ensure your actually measuring the performance of your code rather than a no-op. – erickson Jul 19 '16 at 20:30

4 Answers4

4

Since Math.max is a static function, the compiler may figure out that the code does nothing at all and simply optimizes the execution by not executing it!

Variable m being local to the function, assigning it does not help as it is never read.

You need to make sure that the execution somehow modifies something so it is not aggressively optimized by the compiler.

For instance, you can can simply print the value of m at the end of the test or make m a class variable that can be accessed later or even sum the result as I originally suggested in the comment.

Kamil Jarosz
  • 2,168
  • 2
  • 19
  • 30
Jean Logeart
  • 52,687
  • 11
  • 83
  • 118
2

Math.max(a,b) can be very aggressivly/obviously optimized into a native instruction for the processor.

For the ternary, the simple transformation into processor instruction would be a compare+jump, especially the jump is costly.

To optimize the ternary into the JIT(Just in time compiler) must recognize that the code expresses a max and the native instruction is best.

The JIT might eventually recognize this but until that point it will be slower.

k5_
  • 5,450
  • 2
  • 19
  • 27
  • Doesn't JIT run *once* per class bytecode? It would be a constant time shift, not for every invocation. – Kamil Jarosz Jul 19 '16 at 19:56
  • Depends on JIT. But hotspot (the oracle jvm) compiles first with a fast compile, and if it recognizes hotspots(hence the name) will recompile with more aggressive optimizations. – k5_ Jul 19 '16 at 19:58
1

Just to confirm (to some extent) what Jean Logeart said in his answer: When adding a trivial main that calls both methods, as in

class MaxOpt
{
    public static void main(String args[])
    {
        for (int i=0; i<100000; i++)
        {
            runTests();
        }
    }

    private static void runTests()
    {
        test0();
        test1();
    }

    private static final int n = 10000;

    public static void test0(){
        int m = 0;

        for(int i = 0; i < n; ++i){
            m = Math.max(i, m);
        }
    }

    public static void test1(){
        int m = 0;

        for(int i = 0; i < n; ++i){
            m = ((i >= m) ? i : m);
        }
    }
}

And running it in a hotspot disassembler VM with

-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:+PrintInlining -XX:+PrintAssembly

then (on Win7/64 with Java 1.8.0_92) the final machine code of the test0 method will be

Decoding compiled method 0x0000000002925510:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x000000001bd10458} &apos;test0&apos; &apos;()V&apos; in &apos;MaxOpt&apos;
  #           [sp+0x20]  (sp of caller)
  0x0000000002925640: sub    $0x18,%rsp
  0x0000000002925647: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                        ; - MaxOpt::test0@-1 (line 20)

  0x000000000292564c: add    $0x10,%rsp
  0x0000000002925650: pop    %rbp
  0x0000000002925651: test   %eax,-0x26f5657(%rip)        # 0x0000000000230000
                        ;   {poll_return}
  0x0000000002925657: retq   
  0x0000000002925658: hlt    
  0x0000000002925659: hlt    
  0x000000000292565a: hlt    
  0x000000000292565b: hlt    
  0x000000000292565c: hlt    
  0x000000000292565d: hlt    
  0x000000000292565e: hlt    
  0x000000000292565f: hlt    
[Exception Handler]
  ...   

Yes, it basically does nothing.

Surprisingly, for test1, the JIT obviously does some odd loop unrolling, but does not seem to detect that the method is useless and side-effect-free (it could be optimized away, but it is not)

The resulting assembly is rather huge....:

Decoding compiled method 0x0000000002926290:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x000000001bd103a8} &apos;runTests&apos; &apos;()V&apos; in &apos;MaxOpt&apos;
  #           [sp+0x30]  (sp of caller)
  0x00000000029263c0: mov    %eax,-0x6000(%rsp)
  0x00000000029263c7: push   %rbp
  0x00000000029263c8: sub    $0x20,%rsp         ;*synchronization entry
                        ; - MaxOpt::runTests@-1 (line 13)

  0x00000000029263cc: xor    %r8d,%r8d
  0x00000000029263cf: mov    $0x1,%r11d
  0x00000000029263d5: data32 data32 nopw 0x0(%rax,%rax,1)
                        ;*iload_1
                        ; - MaxOpt::test1@11 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029263e0: cmp    %r8d,%r11d
  0x00000000029263e3: jl     0x000000000292650c  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029263e9: mov    %r11d,%r8d
  0x00000000029263ec: inc    %r8d               ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029263ef: cmp    %r11d,%r8d
  0x00000000029263f2: jl     0x000000000292652d  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029263f8: mov    %r11d,%r9d
  0x00000000029263fb: add    $0x2,%r9d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029263ff: cmp    %r8d,%r9d
  0x0000000002926402: jl     0x000000000292650f  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926408: mov    %r11d,%r8d
  0x000000000292640b: add    $0x3,%r8d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292640f: cmp    %r9d,%r8d
  0x0000000002926412: jl     0x0000000002926518  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926418: mov    %r11d,%r9d
  0x000000000292641b: add    $0x4,%r9d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292641f: cmp    %r8d,%r9d
  0x0000000002926422: jl     0x000000000292650f  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926428: mov    %r11d,%r8d
  0x000000000292642b: add    $0x5,%r8d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292642f: cmp    %r9d,%r8d
  0x0000000002926432: jl     0x0000000002926518  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926438: mov    %r11d,%r9d
  0x000000000292643b: add    $0x6,%r9d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292643f: cmp    %r8d,%r9d
  0x0000000002926442: jl     0x000000000292650f  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926448: mov    %r11d,%r8d
  0x000000000292644b: add    $0x7,%r8d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292644f: cmp    %r9d,%r8d
  0x0000000002926452: jl     0x0000000002926518  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926458: mov    %r11d,%r9d
  0x000000000292645b: add    $0x8,%r9d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292645f: cmp    %r8d,%r9d
  0x0000000002926462: jl     0x000000000292650f  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926468: mov    %r11d,%r8d
  0x000000000292646b: add    $0x9,%r8d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292646f: cmp    %r9d,%r8d
  0x0000000002926472: jl     0x0000000002926518  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926478: mov    %r11d,%r9d
  0x000000000292647b: add    $0xa,%r9d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292647f: cmp    %r8d,%r9d
  0x0000000002926482: jl     0x000000000292650f  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926488: mov    %r11d,%r8d
  0x000000000292648b: add    $0xb,%r8d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292648f: cmp    %r9d,%r8d
  0x0000000002926492: jl     0x0000000002926518  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x0000000002926498: mov    %r11d,%r9d
  0x000000000292649b: add    $0xc,%r9d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292649f: cmp    %r8d,%r9d
  0x00000000029264a2: jl     0x000000000292650f  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264a4: mov    %r11d,%r8d
  0x00000000029264a7: add    $0xd,%r8d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264ab: cmp    %r9d,%r8d
  0x00000000029264ae: jl     0x0000000002926518  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264b0: mov    %r11d,%r9d
  0x00000000029264b3: add    $0xe,%r9d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264b7: cmp    %r8d,%r9d
  0x00000000029264ba: jl     0x000000000292650f  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264bc: mov    %r11d,%r8d
  0x00000000029264bf: add    $0xf,%r8d          ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264c3: cmp    %r9d,%r8d
  0x00000000029264c6: jl     0x0000000002926518  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264c8: add    $0x10,%r11d        ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264cc: cmp    $0x2701,%r11d
  0x00000000029264d3: jl     0x00000000029263e0  ;*if_icmpge
                        ; - MaxOpt::test1@8 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264d9: cmp    $0x2710,%r11d
  0x00000000029264e0: jge    0x0000000002926500
  0x00000000029264e2: xchg   %ax,%ax            ;*iload_1
                        ; - MaxOpt::test1@11 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264e4: cmp    %r8d,%r11d
  0x00000000029264e7: jl     0x0000000002926532  ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264e9: mov    %r11d,%r10d
  0x00000000029264ec: inc    %r10d              ;*iinc
                        ; - MaxOpt::test1@22 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264ef: cmp    $0x2710,%r10d
  0x00000000029264f6: jge    0x0000000002926500  ;*if_icmpge
                        ; - MaxOpt::test1@8 (line 30)
                        ; - MaxOpt::runTests@3 (line 14)

  0x00000000029264f8: mov    %r11d,%r8d
  0x00000000029264fb: mov    %r10d,%r11d
  0x00000000029264fe: jmp    0x00000000029264e4
  0x0000000002926500: add    $0x20,%rsp
  0x0000000002926504: pop    %rbp
  0x0000000002926505: test   %eax,-0x26f650b(%rip)        # 0x0000000000230000
                        ;   {poll_return}
  0x000000000292650b: retq   
  0x000000000292650c: mov    %r11d,%r9d
  0x000000000292650f: mov    %r9d,%r11d
  0x0000000002926512: mov    %r8d,%r9d
  0x0000000002926515: mov    %r11d,%r8d
  0x0000000002926518: mov    $0xffffff65,%edx
  0x000000000292651d: mov    %r8d,0x4(%rsp)
  0x0000000002926522: mov    %r9d,0x8(%rsp)
  0x0000000002926527: callq  0x00000000028557a0  ; OopMap{off=364}
                        ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)
                        ;   {runtime_call}
  0x000000000292652c: int3                      ;*if_icmplt
                        ; - MaxOpt::test1@13 (line 31)
                        ; - MaxOpt::runTests@3 (line 14)

  0x000000000292652d: mov    %r11d,%r9d
  0x0000000002926530: jmp    0x0000000002926518
  0x0000000002926532: mov    %r8d,%r9d
  0x0000000002926535: mov    %r11d,%r8d
  0x0000000002926538: jmp    0x0000000002926518
  0x000000000292653a: hlt    
  0x000000000292653b: hlt    
  0x000000000292653c: hlt    
  0x000000000292653d: hlt    
  0x000000000292653e: hlt    
  0x000000000292653f: hlt    
[Exception Handler]
 ...

I wonder why the method is not optimized. Maybe it's once more worth being asked as a separate question. But my gut feeling is that it is a side effect of the artificial test configuration, involving the final n and the usage of the loop variable for the inner expression. In similar (more realistic) setups, such useless, side-effect-free methods are usually eliminated rather reliably.

Community
  • 1
  • 1
Marco13
  • 53,703
  • 9
  • 80
  • 159
0

When a method runs for the first time, the JVM has not had the opportunity to re-compile the method using its just-in-time compiler. This is the problem with so many micro benchmark efforts. Assuming that main() calls test1(), make main() call test1() multiple times, and measure the time of each test1() call. You'll see that subsequent calls of test1() will run much faster !

public class Test
{
    public static void main(String[] args)
    {
        System.out.println("test0");
        for (int i = 0; i < 10; i++)
        {
            long t = System.currentTimeMillis();
            for (int j = 0; j < 100000; j++)
            {
                test0();
            }
            t = System.currentTimeMillis() - t;
            System.out.println(t);
        }
        System.out.println("test1");
        for (int i = 0; i < 10; i++)
        {
            long t = System.currentTimeMillis();
            for (int j = 0; j < 100000; j++)
            {
                test1();
            }
            t = System.currentTimeMillis() - t;
            System.out.println(t);
        }
    }

    private static final int n = 10000;
    private static int z = 10000;

    private static void test0()
    {
        int m = 0;

        for(int i = 0; i < n; ++i)
        {
            m = Math.max(i, m);
        }

        z += m;
    }

    private static void test1()
    {
        int m = 0;

        for(int i = 0; i < n; ++i)
        {
            m = ((i >= m) ? i : m);
        }

        z += m;
    }
}
Mike
  • 2,393
  • 3
  • 25
  • 37
  • I indicated in the question that every method was run 16000000+ times in a random order – Kamil Jarosz Jul 19 '16 at 22:22
  • I've updated my answer. I've added code, and made it so that the JVM doesn't end up optimizing out / bypassing the loop or call. test1() appears to run about 27% faster than test0() give or take. – Mike Jul 21 '16 at 18:25