0

I was going through book "Java Performance" by Scott Oaks and encountered a code where it was said that Java 7 or 8 JVM is smart enough to skip calculation part provided in for loop because result is not used in future (Microbenchmarking).

Code mentioned in book:

public void doTest() {
    // Main Loop
    double l;
    long then = System.currentTimeMillis();

    for (int i = 0; i < nLoops; i++) {
        l = fibImpl1(50);
    }
    long now = System.currentTimeMillis();
    System.out.println("Elapsed time: " + (now - then));
}

private double fibImpl1(int n) {
   if (n < 0) throw new IllegalArgumentException("Must be > 0");
   if (n == 0) return 0d;
   if (n == 1) return 1d;
   double d = fibImpl1(n - 2) + fibImpl(n - 1);
   if (Double.isInfinite(d)) throw new ArithmeticException("Overflow");
  return d;
} 

Further statements in book: Because the result of the Fibonacci calculation is never used, the compiler is free to discard that calculation. A smart compiler (including current Java 7 and 8 compilers) will end up executing this code:

long then = System.currentTimeMillis();
long now = System.currentTimeMillis();
System.out.println("Elapsed time: " + (now - then));

To validate the same I tried a code but elapsed time calculate doesn't reflect theory explained above.

My version of code:

    public class APSum {

    public static void main(String[] args) {

        long then = System.currentTimeMillis();
        ArithmeticProgression.sum(500000000);
        long now = System.currentTimeMillis();
        System.out.println("Elapsed time: " + (now - then));
    }
}

    class ArithmeticProgression{
    public static double sum(int i){
        double sum=0;
        for(int index=0; index<=i; index++){
            sum = sum + (double)index;
        }
        return sum;
    }
}

So please let me know how to achieve scenario mentioned in book. Or is it JVM wish whether JVM wants to optimize the call or not?

Shashi Shankar
  • 859
  • 2
  • 8
  • 25
  • Did you try the code from the book and did it work? – Radiodef Apr 17 '15 at 06:51
  • 1
    The compiler is smart, but it is not infinitely smart. There are limits to how far it goes to see if it can throw something away because the result is not used. Your example is maybe too complex, so that the compiler doesn't throw it away. – Jesper Apr 17 '15 at 06:51
  • @Jesper - To add to your comment, the OP is not doing *micro-bencharking* in the right way :) – TheLostMind Apr 17 '15 at 06:53
  • 4
    You are mixing up concepts. To my knowledge, the java **compiler** that translates source code into bytecode (class files) is pretty dumb. In contrast to other compilers like gcc, it doesn't do any significant optimization. In other words: out of the many concepts that compiler constructors know to optimize; only few are applied. If that has changed with Java7, 8; I must have missed it ... . But what happens is that the **just-in-time** compiler does constantly optimize code while it is **running**; see here http://docs.oracle.com/cd/E15289_01/doc.40/e15058/underst_jit.htm for example. – GhostCat Apr 17 '15 at 06:55
  • 2
    Also the JIT optimizations are *highly platform dependent*. So, don't expect *consistent* behaviour of same code across different platforms. You are probably looking at a *specialized* JVM optimization known as [*escape analysis*](https://docs.oracle.com/javase/8/docs/technotes/guides/vm/performance-enhancements-7.html#escapeAnalysis) – TheLostMind Apr 17 '15 at 06:58
  • 1
    It is not the compiler which will perform optimizations, it's the JIT. Javac is as dumb as you can get for a compiler. – fge Apr 17 '15 at 07:05
  • @Jesper ... i think after going through one chapter of book, I am over expecting from JVM :-) – Shashi Shankar Apr 17 '15 at 07:06
  • @ShashiShankar No, you are not: the **JVM** optimizations can be extremely smart; and dramatically improve execution time. But the **compiler** does not do that. – GhostCat Apr 17 '15 at 07:10

1 Answers1

2

Modern JVMs are too complex, and do all kinds of optimization. If you try to measure some small piece of code, it is really complicated to do it correctly without very, very detailed knowledge of what the JVM is doing. Dead code elimination(DCE) is the kind of optimization that frequently causes microbenchmarks to go awry.

There are two classical mistakes(read more about common mistakes http://shipilev.net/talks/jvmls-July2014-benchmarking.pdf and https://stackoverflow.com/a/513259/1352098):

  • since the smallest unit of compilation is method, benchmark have to have more than one main method.
  • there is no warm up iterations. Always include a warmup phase which runs your test kernel all the way through, enough to trigger all initializations and compilations before timing phase.

After correction our benchmark looks like the following:

    public class APSum {

    public static void main(String[] args) {
        for (int i = 0; i < 5000; i++) {
            test();
        }
    }

    private static void test() {
        long then = System.currentTimeMillis();
        sum(5000000);
        long now = System.currentTimeMillis();
        System.out.println("Elapsed time: " + (now - then));
    }

    public static double sum(int i){
        double sum=0;
        for(int index=0; index<=i; index++){
            sum = sum + (double)index;
        }
        return sum;
    }
    }

In this example DCE happens only after inlining. Let's start with a look at inlining trees (available with -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -XX:-BackgroundCompilation)

   @ 0   java.lang.System::currentTimeMillis (0 bytes)   intrinsic
   @ 6   edu.jvm.runtime.APSum::sum (22 bytes)   inlining prohibited by policy
   @ 10   java.lang.System::currentTimeMillis (0 bytes)   intrinsic

Compiler had failed to inline method APSum::sum due to OSR. In fact, though OSR compilation is frequently triggered in benchmarks (and particularly in microbenchmarks), it is triggered less frequently in application code. In order to get the right results we have to add more warmup iterations:

    public class APSum {

    public static void main(String[] args) {
        for (int i = 0; i < 10000; i++) {
            test();
        }
    }

    private static void test() {
        long then = System.currentTimeMillis();
        sum(5000000);
        long now = System.currentTimeMillis();
        System.out.println("Elapsed time: " + (now - then));
    }

    public static double sum(int i){
        double sum=0;
        for(int index=0; index<=i; index++){
            sum = sum + (double)index;
        }
        return sum;
    }
    }

As a result at the end of iterations we have:

  Elapsed time: 5
  Elapsed time: 4
  Elapsed time: 5

    @ 0   java.lang.System::currentTimeMillis (0 bytes)   (intrinsic)
    @ 6   edu.jvm.runtime.APSum::sum (22 bytes)   inline (hot)
    @ 10   java.lang.System::currentTimeMillis (0 bytes)   (intrinsic)

  Elapsed time: 0
  Elapsed time: 0
  Elapsed time: 0
Community
  • 1
  • 1
Ivan Mamontov
  • 2,874
  • 1
  • 19
  • 30
  • failed to inline due to `OSR`? How come? Shouldn't inlining sort of complement OSR? – Eugene Mar 11 '18 at 19:26
  • @Eugene From the sources http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/runtime/advancedThresholdPolicy.cpp#l275 "if we're compiling a profiled method with C1 and the callee is known to have OSRed in a C2 version, don't inline it." – Ivan Mamontov Mar 11 '18 at 21:38