86

In this code when I create an Object in the main method and then call that objects method: ff.twentyDivCount(i)(runs in 16010 ms) , it runs much faster than calling it using this annotation: twentyDivCount(i)(runs in 59516 ms). Of course, when I run it without creating an object, I make the method static, so it can be called in the main.

public class ProblemFive {

    // Counts the number of numbers that the entry is evenly divisible by, as max is 20
    int twentyDivCount(int a) {    // Change to static int.... when using it directly
        int count = 0;
        for (int i = 1; i<21; i++) {

            if (a % i == 0) {
                count++;
            }
        }
        return count;
    }

    public static void main(String[] args) {
        long startT = System.currentTimeMillis();;
        int start = 500000000;
        int result = start;

        ProblemFive ff = new ProblemFive();

        for (int i = start; i > 0; i--) {

            int temp = ff.twentyDivCount(i); // Faster way
                       // twentyDivCount(i) - slower

            if (temp == 20) {
                result = i;
                System.out.println(result);
            }
        }

        System.out.println(result);

        long end = System.currentTimeMillis();;
        System.out.println((end - startT) + " ms");
    }
}

EDIT: So far it seems that different machines produce different results, but using JRE 1.8.* is where the original result seems to be consistently reproduced.

Stabbz
  • 768
  • 5
  • 12
  • 4
    How are you running your benchmark? I'd bet that this is an artifact of the JVM not having enough time to optimize the code. – Patrick Collins May 26 '15 at 10:02
  • 2
    Seems that it's enough time for JVM to compile and perform an OSR for the main method as `+PrintCompilation +PrintInlining` shows – Tagir Valeev May 26 '15 at 10:03
  • but normally this warmup period should not be measured and here `currentTimeMillis` is very first call in main. – sodik May 26 '15 at 10:05
  • @sodik that would not explain a difference of 40+ seconds. You can use nanotime and run the loop twice to allow for warmup, the result is the same. – assylias May 26 '15 at 10:06
  • On my device, using the static one result was `twentyDivCount(i) -> 47937 ms` and using it through object `ff.twentyDivCount(i) -> 51410 ms` :/ – Mohammad Ersan May 26 '15 at 10:09
  • 1
    I had tried the code snippet , but i don't getting any such time difference as Stabbz said. They 56282ms (using instance) 54551ms (as static method) . – Don Chakkappan May 26 '15 at 10:10
  • 1
    @PatrickCollins Five seconds must suffice. I [rewrote it a bit](https://dl.dropboxusercontent.com/u/4971686/published/maaartin/so/ProblemFive.java) so that you can measure both (one JVM gets started per variant). I know that as a benchmark it's still flawed, but it convincing enough: 1457 ms STATIC vs 5312 ms NON_STATIC. – maaartinus May 26 '15 at 10:16
  • On my machine the times were much closer - `52135 ms` for the instance method and `52070 ms` for the static method. – Eran May 26 '15 at 10:18
  • @Eran What version of JRE are you using? – assylias May 26 '15 at 10:29
  • @Eran I'm not surprised that the hotspot compiler behaves very differently with Java 6 and 8. – assylias May 26 '15 at 10:37
  • Have you run the tests multiple times? – Ryan Dougherty May 26 '15 at 13:28
  • It would be depending on the machine. In My machine 89534 ms for instance and 70033 ms for static. More info: http://stackoverflow.com/questions/15226547/is-using-static-private-methods-really-faster-better-than-instance-private-metho – Clement Amarnath May 26 '15 at 10:33
  • Can we please use stopwatches for benchmarks? As eric lippert constantly points out, using clocks for purposes other than which they are intended is a bad time measurement. – D. Ben Knoble May 27 '15 at 02:28
  • 1
    Haven't yet investigated the question in detail, but this *might* be related: http://shipilev.net/blog/2015/black-magic-method-dispatch/ (maybe Aleksey Shipilëv can enlighten us here) – Marco13 May 27 '15 at 09:45
  • 1
    ...and you are measuring the performance of the OSR stub, not the final performance. Please benchmark properly (shameless plug: http://openjdk.java.net/projects/code-tools/jmh/) – Aleksey Shipilev May 29 '15 at 15:13
  • @AlekseyShipilev Right, but it's the same (assuming that running the main loop twice as I did now [here](https://dl.dropboxusercontent.com/u/4971686/published/maaartin/so/ProblemFive.java) suffices. Sure, JMH should be used, but what if it would happen in a real program but not in a benchmark? The difference is huge enough to justify benchmarking without a perfect harness (I took care about starting a new JVM and I don't think anything else could go wrong here), isn't it? – maaartinus Jun 01 '15 at 12:29
  • @maaartinus: Suffices for what? It seems wrong to assume that a large difference somehow validates the benchmarking. It is also wrong to assume that controlled JMH test would be farther off from "real program" than the one driven by off-hand trivial harness that measures God knows what. So you have a trivial harness wound up, what's next? How can you prove it does the right thing? For example, how would you make sure `twentyDivCount` is not short-cutted once a first `a % i == 0` fails? – Aleksey Shipilev Jun 01 '15 at 12:36
  • @AlekseyShipilev I suppose you're right because I know what you do, but I don't understand the short-cutting of `a % i == 0` at all. The method counts by how many numbers (from the set 1 to 20) `a` can be divided and I can't imagine any shortcut the JVM could come up with (sure, it can be done much smarter, but things like factoring are out of scope, aren't they?). But whatever the JVM does, the question remains: Why doesn't it do it consequently? – maaartinus Jun 01 '15 at 12:51
  • @maaartinus: Well, I can come up with a plausible optimization that says that the upper bound for `temp` is exactly 20 if every `a % i` is 0. And we only act when `temp == 0`. Therefore, if at least on `a % i` is not 0, nobody will notice, and we can just move on to the next iteration. – Aleksey Shipilev Jun 01 '15 at 12:54
  • @maaartinus: Try `-XX:-UseOnStackReplacement` with your harness. The trouble with the test is that it measures the a transient performance, likely in OSR stub. And the OSR compilation may be affected by loop unrolling options, since it's job is to compile a much narrower piece of code. But that's only transient, and the steady state performance should be the same. I would guess OP does not even care about this, and wants the steady state performance. All these little things, while interesting, have only a tiny relation to the "real world" performance. – Aleksey Shipilev Jun 01 '15 at 12:59
  • @AlekseyShipilev Thanks, now I understand the shortcutting. Without OSR everything below LoopUnrollLimit=72 is slow until the third iteration and then the original behavior restores (so OSR and the real thing produce the same results). I see that the steady state is the only important thing for servers, but maybe the OP's whole program should run for a few seconds or minutes only. – maaartinus Jun 01 '15 at 13:12

5 Answers5

72

Using JRE 1.8.0_45 I get similar results.

Investigation:

  1. running java with the -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining VM options shows that both methods get compiled and inlined
  2. Looking at the generated assembly for the methods themselves shows no significant difference
  3. Once they get inlined, however, the generated assembly within main is very different, with the instance method being more aggressively optimised, especially in terms of loop unrolling

I then ran your test again but with different loop unrolling settings to confirm the suspicion above. I ran your code with:

  • -XX:LoopUnrollLimit=0 and both methods run slowly (similar to the static method with the default options).
  • -XX:LoopUnrollLimit=100 and both methods run fast (similar to the instance method with the default options).

As a conclusion it seems that, with the default settings, the JIT of hotspot 1.8.0_45 is not able to unroll the loop when the method is static (although I'm not sure why it behaves that way). Other JVMs may yield different results.

assylias
  • 321,522
  • 82
  • 660
  • 783
  • Between 52 and 71, the original behavior is restored (at least on my machine, s. my answer). It looks like the static version was 20 units bigger, but why? This is strange. – maaartinus May 26 '15 at 11:08
  • 3
    @maaartinus I'm not even sure what that number represents exactly - the doc is fairly evasive: "*Unroll loop bodies with server compiler intermediate representation node count less than this value. The limit used by the server compiler is **a function of this value, not the actual value**. The default value varies with the platform on which the JVM is running.*"... – assylias May 26 '15 at 11:09
  • Neither I know, but my first guess was that the static methods gets just slightly bigger in whatever units and that we hit the spot where it matters. However, the difference is pretty big, so my current guess is that the static version gets some optimizations which make it quite a bit bigger. I haven't looked at the generated asm. – maaartinus May 26 '15 at 11:16
33

Just an unproved guess based an assylias' answer.

The JVM uses a threshold for loop unrolling, which is something like 70. For whatever reason, the static call is slightly bigger and doesn't get unrolled.

Update results

  • With the LoopUnrollLimit in the below 52, both versions are slow.
  • Between 52 and 71, only the static version is slow.
  • Above 71, both versions are fast.

This is strange as my guess was that the static call is just slightly bigger in the internal representation and the OP hit a strange case. But the difference seems to be about 20, which makes no sense.

 

-XX:LoopUnrollLimit=51
5400 ms NON_STATIC
5310 ms STATIC
-XX:LoopUnrollLimit=52
1456 ms NON_STATIC
5305 ms STATIC
-XX:LoopUnrollLimit=71
1459 ms NON_STATIC
5309 ms STATIC
-XX:LoopUnrollLimit=72
1457 ms NON_STATIC
1488 ms STATIC

For those willing to experiment, my version may be useful.

Community
  • 1
  • 1
maaartinus
  • 44,714
  • 32
  • 161
  • 320
0

When this is executed in debug mode, the numbers are the same for the instance and static cases. That further means that the JIT hesitates to compile the code to native code in the static case the same way as it does in the instance method case.

Why does it do so? It is hard to say; probably it would do the right thing if this were a larger application...

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Dragan Bozanovic
  • 23,102
  • 5
  • 43
  • 110
  • "Why does it do so? Hard to say, probably it would do the right thing if this were a larger app." Or you'd just have a weird performance problem that's too big to actually debug. (And it's not that hard to say. You can look at the assembly the JVM spits out like assylias did.) – tmyklebu May 26 '15 at 15:58
  • @tmyklebu Or we have a weird performance problem that's unnecessary and expensive to fully debug and there are easy workarounds. At the end, we are talking about JIT here, its authors don't know how it behaves exactly in all the situations. :) Look at the other answers, they are very good and very close to explain the issue, but still so far nobody knows why exactly this is happening. – Dragan Bozanovic May 26 '15 at 16:04
  • @DraganBozanovic: It stops being "unnecessary to fully debug" when it causes real issues in real code. – tmyklebu May 26 '15 at 18:20
0

I just tweaked the test slightly and I got the following results:

Output:

Dynamic Test:
465585120
232792560
232792560
51350 ms
Static Test:
465585120
232792560
232792560
52062 ms

NOTE

While I was testing them separately I got ~52 sec for dynamic and ~200 sec for static.

This is the program:

public class ProblemFive {

    // Counts the number of numbers that the entry is evenly divisible by, as max is 20
    int twentyDivCount(int a) {  // Change to static int.... when using it directly
        int count = 0;
        for (int i = 1; i<21; i++) {

            if (a % i == 0) {
                count++;
            }
        }
        return count;
    }

    static int twentyDivCount2(int a) {
         int count = 0;
         for (int i = 1; i<21; i++) {

             if (a % i == 0) {
                 count++;
             }
         }
         return count;
    }

    public static void main(String[] args) {
        System.out.println("Dynamic Test: " );
        dynamicTest();
        System.out.println("Static Test: " );
        staticTest();
    }

    private static void staticTest() {
        long startT = System.currentTimeMillis();;
        int start = 500000000;
        int result = start;

        for (int i = start; i > 0; i--) {

            int temp = twentyDivCount2(i);

            if (temp == 20) {
                result = i;
                System.out.println(result);
            }
        }

        System.out.println(result);

        long end = System.currentTimeMillis();;
        System.out.println((end - startT) + " ms");
    }

    private static void dynamicTest() {
        long startT = System.currentTimeMillis();;
        int start = 500000000;
        int result = start;

        ProblemFive ff = new ProblemFive();

        for (int i = start; i > 0; i--) {

            int temp = ff.twentyDivCount(i); // Faster way

            if (temp == 20) {
                result = i;
                System.out.println(result);
            }
        }

        System.out.println(result);

        long end = System.currentTimeMillis();;
        System.out.println((end - startT) + " ms");
    }
}

I also changed the order of the test to:

public static void main(String[] args) {
    System.out.println("Static Test: " );
    staticTest();
    System.out.println("Dynamic Test: " );
    dynamicTest();
}

And I got this:

Static Test:
465585120
232792560
232792560
188945 ms
Dynamic Test:
465585120
232792560
232792560
50106 ms

As you see, if dynamic is called before static the speed for static dramatically decreased.

Based on this benchmark:

I hypothesize that it all depends on the JVM optimization. thus I just recommend you to go with the rule of thumb for use of static and dynamic methods.

RULE OF THUMB:

Java: when to use static methods

Community
  • 1
  • 1
nafas
  • 5,283
  • 3
  • 29
  • 57
  • "you to go with the rule of thumb for use of static and dynamic methods." What is this rule of thumb? And who/what are you quoting from? – weston May 26 '15 at 10:49
  • @weston sorry I didn't add the link I was thinking of :). thx – nafas May 26 '15 at 10:52
0

Please try:

public class ProblemFive {
    public static ProblemFive PROBLEM_FIVE = new ProblemFive();

    public static void main(String[] args) {
        long startT = System.currentTimeMillis();
        int start = 500000000;
        int result = start;


        for (int i = start; i > 0; i--) {
            int temp = PROBLEM_FIVE.twentyDivCount(i); // faster way
            // twentyDivCount(i) - slower

            if (temp == 20) {
                result = i;
                System.out.println(result);
                System.out.println((System.currentTimeMillis() - startT) + " ms");
            }
        }

        System.out.println(result);

        long end = System.currentTimeMillis();
        System.out.println((end - startT) + " ms");
    }

    int twentyDivCount(int a) {  // change to static int.... when using it directly
        int count = 0;
        for (int i = 1; i < 21; i++) {

            if (a % i == 0) {
                count++;
            }
        }
        return count;
    }
}
chengpohi
  • 14,064
  • 1
  • 24
  • 42