2

I was confused by the codes as follows:

 public static void test(){
        long currentTime1 = System.currentTimeMillis();

        final int iBound = 10000000;
        final int jBound = 100;
        for(int i = 1;i<=iBound;i++){
            int a = 1;
            int tot = 10;
            for(int j = 1;j<=jBound;j++){
                tot *= a;
            }
        }


        long updateTime1 = System.currentTimeMillis();
        System.out.println("i:"+iBound+" j:"+jBound+"\nIt costs "+(updateTime1-currentTime1)+" ms");
    }

That's the first version, it costs 443ms on my computer.

first version result

 public static void test(){
        long currentTime1 = System.currentTimeMillis();

        final int iBound = 100;
        final int jBound = 10000000;
        for(int i = 1;i<=iBound;i++){
            int a = 1;
            int tot = 10;
            for(int j = 1;j<=jBound;j++){
                tot *= a;
            }
        }


        long updateTime1 = System.currentTimeMillis();
        System.out.println("i:"+iBound+" j:"+jBound+"\nIt costs "+(updateTime1-currentTime1)+" ms");
    }

The second version costs 832ms. second version result The only difference is that I simply swap the i and j.

This result is incredible, I test the same code in C and the difference in C is not that huge.

Why is this 2 similar codes so different in java?

My jdk version is openjdk-14.0.2

Drinter
  • 51
  • 3
  • 1
    Try to run one of your test cases 20 times. Each time you will get a different timing, sometimes faster sometimes slower. Though, you're running always the same code. – Matteo NNZ Aug 27 '21 at 12:25
  • Try running both cases multiple times and try getting an average time taken to run both. Does that reduce the average difference between average execution times of both cases? – Shivam Puri Aug 27 '21 at 12:29
  • no. I've run more than 20 times, and every time it turns to be a similar result that version1 always costs about 400ms less than the version2. – Drinter Aug 27 '21 at 12:32

3 Answers3

3

TL;DR - This is just a bad benchmark.

I did the following:

  • Create a Main class with a main method.

  • Copy in the two versions of the test as test1() and test2().

  • In the main method do this:

    while(true) {
        test1();
        test2();
    }
    

Here is the output I got (Java 8).

i:10000000 j:100
It costs 35 ms
i:100 j:10000000
It costs 33 ms
i:10000000 j:100
It costs 33 ms
i:100 j:10000000
It costs 25 ms
i:10000000 j:100
It costs 0 ms
i:100 j:10000000
It costs 0 ms
i:10000000 j:100
It costs 0 ms
i:100 j:10000000
It costs 0 ms
i:10000000 j:100
It costs 0 ms
i:100 j:10000000
It costs 0 ms
i:10000000 j:100
It costs 0 ms
....

So as you can see, when I run two versions of the same method alternately in the same JVM, the times for each method are roughly the same.

But more importantly, after a small number of iterations the time drops to ... zero! What has happened is that the JIT compiler has compiled the two methods and (probably) deduced that their loops can be optimized away.

It is not entirely clear why people are getting different times when the two versions are run separately. One possible explanation is that the first time run, the JVM executable is being read from disk, and the second time is already cached in RAM. Or something like that.

Another possible explanation is that JIT compilation kicks in earlier1 with one version of test() so the proportion of time spent in the slower interpreting (pre-JIT) phase is different between the two versions. (It may be possible to teas this out using JIT logging options.)

But it is immaterial really ... because the performance of a Java application while the JVM is warming up (loading code, JIT compiling, growing the heap to its working size, loading caches, etc) is generally speaking not important. And for the cases where it is important, look for a JVM that can do AOT compilation; e.g. GraalVM.

1 - This could be because of the way that the interpreter gathers stats. The general idea is that the bytecode interpreter accumulates statistics on things like branches until it has "enough". Then the JVM triggers the JIT compiler to compile the bytecodes to native code. When that is done, the code runs typically 10 or more times faster. The different looping patterns might it reach "enough" earlier in one version compared to the other. NB: I am speculating here. I offer zero evidence ...


The bottom line is that you have to be careful when writing Java benchmarks because the timings can be distorted by various JVM warmup effects.

For more information read: How do I write a correct micro-benchmark in Java?

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
2

I test it myself, I get same difference (around 16ms and 4ms).

After testing, I found that :

Declare 1M of variable take less time than multiple by 1 1M time.

How ?

I made a sum of 100

final int nb = 100000000;
for(int i = 1;i<=nb;i++){
   i *= 1;
   i *= 1;
[... written 20 times]
   i *= 1;
   i *= 1;
}

And of 100 this:

final int nb = 100000000;
for(int i = 1;i<=nb;i++){
   int a = 0;
   int aa = 0;
[... written 20 times]
   int aaaaaaaaaaaaaaaaaaaaaa = 0;
   int aaaaaaaaaaaaaaaaaaaaaaa = 0;
}

And I respectively get 8 and 3ms, which seems to correspond to what you get.

You can have different result if you have different processor.

Elikill58
  • 4,050
  • 24
  • 23
  • 45
  • Thanks. But I'm still confused, these 2 versions all did 100*10^7 multiplications, yet the 1st declares 2*10^7 variable, the 2nd declares 2*100 variable, shouldn't 2nd version faster than 1st version while they multiplicated same times and the 1st declare much more times than 2nd. – Drinter Aug 27 '21 at 12:59
  • That's a good point, and I think this should need more test. I suggest you to make multiple test and try to use nano seconds instead of ms (because for me I have low diff for example). Sorry I don't have to make test now :/ – Elikill58 Aug 27 '21 at 13:05
  • The only thing left is the multiplication itself. In the first version you make only 100 multiplications for each `i` and then `a` and `tot` are redefined with the next `i` value. In the second version you make 10000000 multiplications for each `i` resulting in a much bigger `tot` value. So, this means that the multiplication with bigger numbers is slower?! – Enak Aug 27 '21 at 13:10
  • it multiply by 1 (such as a=1), so so it's not bigger – Elikill58 Aug 27 '21 at 13:14
  • No problem, I did the same at beginning, and another answer (removed now) did the same. it's confusing – Elikill58 Aug 27 '21 at 13:16
0

you found the answer in algorithm books first chapter :

cost of producing and assigning is 1. so in first algorithm you have 2 declaration and assignation 10000000 and in second one you make it 100. so you reduce time ...

in first : 5 in main loop and 3 in second loop -> second loop is : 3*100 = 300 then 300 + 5 -> 305 * 10000000 = 3050000000

in second : 3*10000000 = 30000000 - > (30000000 + 5 )*100 = 3000000500

so the second one in algorithm is faster in theory but I think its back to multi cpu's ...which they can do 10000000 parallel job in first but only 100 parallel job in second .... so the first one became faster.

pouya
  • 112
  • 1
  • 8
  • 1
    Can you be more specific? His first version is faster than the second one. According to what you have written, it should be flipped. The first version needs to create `a`, `tot` and `j` 10000000 but the second one only 100 times. – Enak Aug 27 '21 at 12:56