Why does test1() run much faster than test2()?

Question

import java.util.Random;


public class Test{
    static int r = new Random().nextInt(2);
    static int a(){
        return r==1 ? 1 :0;
    }

    public static void test1() throws  Exception {
        //
        System.out.println(1403187139018L);
        for (int i = 0; i <   1073741824; i++) {}//*

        // Thread.sleep(20000);
        long d = 0;

        for (int j = 0; j < 10; j++) {
            long y = System.currentTimeMillis();

            for (int x = 0; x < 1073741823; x++) {
                d += r==0?1:0;
            }
            System.out.println((System.currentTimeMillis() -y));
        }
    }

    public static void test2()  throws  Exception{

        // Thread.sleep(20000);
        long d = 0;

        for (int j = 0; j < 10; j++) {
            long y = System.currentTimeMillis();

            for (int x = 0; x < 1073741824; x++) {
                d += r==0?1:0;
            }
            System.out.println((System.currentTimeMillis() -y));

            // System.out.println("time:"+ (System.currentTimeMillis() - y));
        }
    }

    public static void main(String[] args) throws  Exception{
        // Thread.sleep(20000);

        test1();
        test2();

    }

}

When I run the above code, I get this output:

Why is test1 much faster ?

There is no difference except the followings:

System.out.println(1403187139018L);
for (int i = 0; i <   1073741824; i++) {}//*

Also, the time cost for test1 is 25-35 milliseconds, which I find unbelievable. I wrote the same code in C and it needed about 4 seconds to run every for loop.

This behavior seems strange. How do I know when to add:

System.out.println(1403187139018L);
for (int i = 0; i <   1073741824; i++) {}//*

Also, if I change

r==0?1:0

to

a()

then test2() runs faster than test1().

The output I get is:

1403187139018
3726
3729
3619
3602
3797
4362
4498
3816
4143
4368
1673
1386
1388
1323
1296
1337
1294
1283
1235
1460

the original legacy code: ...

long t = System.currentTimeMillis();
MappedByteBuffer mbb = map(new File("temp.mmp"), 1024L * 1024 * 1024);

System.out.println("load " + (System.currentTimeMillis() - t));//*
for (int i = 0; i < 2014L * 1024 * 1024; i++) {}//*
int d = 0;
for (int j = 0; j < 10; j++) {
    t = System.currentTimeMillis();
    mbb.position(0);
    mbb.limit(mbb.capacity());

    for (int i = 0; i < mbb.capacity(); i++) {
        d += mbb.get();
    }

    ....
}
System.out.println(d);

This is the sort of thing where you inspect the generated code. — tmyklebu, Jun 21 '14 at 20:09

score 5 · Answer 1 · answered Jun 21 '14 at 16:16

5

The empty loop probably triggers JIT compilation in the first method. And the JIT is smart enough to realize that your code doesn't do anything useful except getting the current time and printing time difference. So it optimizes the useless code by not running it at all.

If you write actual, useful code, the JIT will do the right thing. Don't try to mess with it by adding empty loops.

answered Jun 21 '14 at 16:16

JB Nizet

678,734
91
1,224
1,255

Yep. See http://stackoverflow.com/questions/7271147/java-how-much-time-does-an-empty-loop-use – Aaron Blenkush Jun 21 '14 at 16:17
This. For example, try printing out the value od `d` after the loop. It will still not be correct, but the timings will change. – Petr Janeček Jun 21 '14 at 16:17
yeah i add the System.out.println(d); after the nested loop,the result is not changed.how do JIT run for (int x = 0; x < 1073741823; x++) { d += r==0?1:0; } to get d? – user2219372 Jun 21 '14 at 16:25
why JIT can work in first example if I add code System.out.println(1403187139018L); for (int i = 0; i < 1073741824; i++) {}//. if comments one of this line the behaivor is changed. – user2219372 Jun 21 '14 at 16:27
The JIT detects code that is run often (like your large loop doing nothing), and tries to optimize it. It's smart in doing so. So it detects that the other loop doesn't do anything except modifying a local variable that is not used anywhere in the code. So it detects that the only thing this code is doing is burn CPU cycles for nothing. So it removes this code. – JB Nizet Jun 21 '14 at 16:30
I think the fact is a little unacceptable.In test2 i need about 13 seconds to run.If a add 2 lines of wasted code, I takes not more than 1 second to get the result. – user2219372 Jun 21 '14 at 16:59
Execute test2() many times (1000 times, for example), and you'll probably have the JIT optimize it as well. But you're trying to draw conclusions from a bad benchmark doing useless things. Don't. Write your actual code as naturally as possible, and the JIT will optimize when it's a good idea to do so. – JB Nizet Jun 21 '14 at 17:03
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000005 828 820 830 831 864 854 832 816 998 1027 950 907 804 817 838 819 909 828 841 851 839 854 839 837 819 840 824 853 902 860 820 813 812 832 852 898 841 828 843 838 850 839 848 836 1014 847 840 845 832 826 848 823 888 890 846 857 833 896 830 936 949 853 901 908 797 825 853 840 967 1007 1156 828 868 808 856 913 958 838 839 828 830 812 983 951 954 839 899 831 910 987 882 ...can't add string to System.out.println((System.currentTimeMillis() -y)); if added ,test1 will changed to be slow. – user2219372 Jun 21 '14 at 17:10
I had thought because it is cpu bound and io used,the cpu scheduler does like this kind of process.But the c code does't prove this. – user2219372 Jun 21 '14 at 17:12
Stop trying to make these methods fast or slow. They don't do anything useful. Start writing useful code, and trust the JIT to do what it should. You were explained several times why what you see happens, but seem to completely ignore the responses you get. So I'm done. – JB Nizet Jun 21 '14 at 17:13
actually it's from a real code,I delete about 30 lines codes which is unrelated to the results. It take about 2 hour to filled out the code unrelated with performance. – user2219372 Jun 21 '14 at 18:05

score 1 · Accepted Answer · answered Jun 21 '14 at 17:32

There are too many factors that affect JIT compilations:

Execution statistics. While interpreter runs a method it collects different statistics: which paths are executed, which branches are taken, which class instances are seen etc. In test1 the statistics is collected inside the first (empty) loop and thus fools the JIT compiler about the real execution scenario.
Class initialization and uncommon traps. When you remove the first System.out.println from test1, not all classes related to printing are initialized. An attempt to invoke a method of an uninitialized class causes uncommon trap which leads to deoptimization and further recompilation of the method using new knowledge.
The wrong statistics collected in test1 plays the bad joke when you replace r==0?1:0 with a(). In compiled test1 the method a() has never been executed before and thus has not had a chance to be optimized. That's why it works slower than test2 which has been compiled with the knowledge of a().

Of course, it is hard to predict all factors affecting JIT-compilation when trying to write a microbenchmark from scratch. That's why the recommended way to benchmark your code is using special frameworks where most of these problems have been solved already. I personally suggest JMH.

score 0 · Answer 3 · answered Jun 21 '14 at 16:25

When code is optimised it uses the information it has about how the code has run before to optimise it. In test1() the first loop triggers the whole method to be optimised, however there is no information about how the second loop will be run so it is not optimised as well as test2()

What I expect should happen is that the method is re-optimised however the code has to detect that an assumption it made the first time is not valid.

Guessing what might be different, the test2() could have been loop unrolled whereas in test1() it has not. This could explain the difference in performance.

I had thought the reason is JIT or cpu scheduler .BUt when i change the r==0 ? 1: 0 to a(),the ouputs is completely changed. — user2219372, Jun 21 '14 at 17:04

Why does test1() run much faster than test2()?

3 Answers3

Linked