1

When the number of iterations is the same, I wonder if the smaller iterations of the outer 'for loop' are faster.

This is my java code.

total loops iterations is 100 billion.

public class Test {

    public static long run1() {
        final long start = System.nanoTime();
        long cnt = 0;
        for(int i = 0; i<1000000000; i++) {
            for(int j = 0 ; j<100 ; j++) {
                cnt++;
            }
        }
        final long end = System.nanoTime();

        System.out.println("run1 : "+(end-start)*1e-9+" sec");
        return start - end;
    }



    public static long run2() {
        final long start = System.nanoTime();
        long cnt = 0;
        for(int i = 0; i<100; i++) {
            for(int j = 0 ; j<1000000000 ; j++) {
                cnt++;
            }
        }
        final long end = System.nanoTime();

        System.out.println("run2 : "+(end-start)*1e-9+" sec");
        return start - end;
    }


    public static void main(String[] args) {

        for(int i = 0 ; i < 10 ; i++) {
            long test1 = run1();
            long test2 = run2();
            System.out.println("Time diff : "+(test1-test2)*1e-9+" sec\n");
        }
    }
}

and this is result

run1 : 4.5772473950000006 sec
run2 : 1.7155700900000002 sec
Time diff : -2.861677305 sec

run1 : 4.4107229420000005 sec
run2 : 1.6532413140000002 sec
Time diff : -2.7574816280000003 sec

run1 : 3.815836296 sec
run2 : 1.761153921 sec
Time diff : -2.054682375 sec

run1 : 3.913543021 sec
run2 : 1.752971717 sec
Time diff : -2.1605713040000003 sec

run1 : 3.851766485 sec
run2 : 1.7262416440000001 sec
Time diff : -2.1255248410000003 sec

run1 : 3.95305219 sec
run2 : 1.7323067330000002 sec
Time diff : -2.220745457 sec

run1 : 3.924869236 sec
run2 : 1.6953032440000002 sec
Time diff : -2.229565992 sec

run1 : 3.839871705 sec
run2 : 1.692300162 sec
Time diff : -2.147571543 sec

run1 : 3.93864626 sec
run2 : 1.704322469 sec
Time diff : -2.234323791 sec

run1 : 3.863758493 sec
run2 : 1.700392962 sec
Time diff : -2.163365531 sec

like this,

I wonder why these results are coming.

and, edited by Triple nested loops, Similar results are obtained.

김동균
  • 33
  • 1
  • 5
  • 1
    flip calls to run1 with run2, ie execute run2 first and check results – Iłya Bursov Feb 14 '20 at 19:09
  • waiting to see the result after the changes suggested by @Iłya Bursov. Nice observation though. – User_67128 Feb 14 '20 at 19:35
  • @IłyaBursov Now I'm checking, I have modified it according to your advice. but the result is same that. run2 is faster than run1. – 김동균 Feb 14 '20 at 19:46
  • As already said in the answers, your loops do nothing at all and can be fully eliminated. You're basically measuring how well a useless code gets optimized before the optimizer really kicks in (read about JIT warm up). Benchmarking Java is too hard, please learn JMH. – maaartinus Feb 15 '20 at 01:50

2 Answers2

2

The short answer is: JIT and your math is wrong.

Let's start with your run methods, which end with this print line and a return

System.out.println("run1 : "+(end-start)*1e-9+" sec");
return start - end;

The problem here is you're printing end-start, and returning start-end. Instead of doing the math in the run methods, I shifted it around so that each run is only responsible for the run and the timing:

public class loops
{

  public static long run1()
  {
    final long start = System.nanoTime();
    long cnt = 0;
    for (int i = 0; i < 1000000000; i++)
    {
      for (int j = 0; j < 100; j++)
      {
        cnt++;
      }
    }
    final long end = System.nanoTime();
    return end - start;
  }

  public static long run2()
  {
    final long start = System.nanoTime();
    long cnt = 0;
    for (int i = 0; i < 100; i++)
    {
      for (int j = 0; j < 1000000000; j++)
      {
        cnt++;
      }
    }
    final long end = System.nanoTime();
    return end - start;
  }

  public static void main(String[] args)
  {

    for (int i = 0; i < 10; i++)
    {
      long test1 = run1();
      long test2 = run2();
      System.out.println("run1 : " + test1 + " ns");
      System.out.println("run2 : " + test2 + " ns");
      System.out.println("Time diff : " + (test1 - test2) + " ns\n");
    }
  }
}

I also took out the math because.. well, why bother? what's wrong with looking at the values as nanoseconds? That's what the actually are...

Running this you can see initial large values that decrease and approach zero. In some runs they do even reach zero for both versions.

Java 8 Run

Zero nanoseconds? Yes, instant computing. Let Quantum computing suck on that! Okay, it's not really instant. Not when you check the nanoTime description:

This method provides nanosecond precision, but not necessarily nanosecond resolution (that is, how frequently the value changes)

It's just faster than is feasibly measurable becauser the JIT has probably ultimately figured out that the method doesn't actually do anything. Now why one or the other decreases faster than the other would require a deeper dive into the JIT and how it decides to optimize one method vs the another - which is more than I know.

Utilmately, there is virtually no difference. After the intial run of both methods, the largest time difference is only a few milliseconds at most. I don't believe this is anything significant to say that one version is truly faster than the other.

Interesting side notes: JDK-11 didn't seem to perform as well... Java 11 run

GraalVM didn't seem to perform as well... although, there's probably some flag that would improve it... GraalVM

Jason Warner
  • 2,469
  • 1
  • 11
  • 15
  • Thanks!! I haven't learned 'jit' yet. so I doesn't understand about your some advices.. so, the only way is to learn 'jit'? finally, your answer is each that nested for loop performance are virtually no difference. right? when I this code write, I reference this [question](https://stackoverflow.com/questions/23914350/a-big-loop-within-a-small-loop-always-faster-than-a-small-loop-within-a-big-one). – 김동균 Feb 15 '20 at 09:24
  • This may be overly simplified - the JIT is a Just In Time compiler that is part of the JVM. It will analyze and sometimes rewrite portions of code into machine code to improve the performance of frequently called portions. And yes, in my opinion both loop methods perform equally in Java. Other languages, other code, other changes will possibly exhibit other performance behaviors. But for this simple example , any difference in performance is minor (occurring only during JVM warmup) and is eventually nullified by the JIT. – Jason Warner Feb 15 '20 at 12:59
2

The performance difference you see is because of loop unrolling, JVM unrolls internal loop first, but cannot unroll outer loop.

Unrolling means that:

for (int i=0;i<MAX;i++) cnt++;

is converted into something like:

int i=0;
for (;(i+16)<=MAX; i+=16) cnt+=16;
for (i<MAX; i++) cnt++; // handle case when MAX % 16 != 0

so total number of operations is actually:

run1: 1000000000 * (100 / 16 + 4) = 10 000 000 000 operations
run2: 100 * (1000000000 / 16)     =  6 250 000 000 operations
Iłya Bursov
  • 23,342
  • 4
  • 33
  • 57
  • +1 _"Flawed microbenchmarks - is there any other kind?"_, [asked Brian Goetz](https://www.ibm.com/developerworks/library/j-jtp02225/index.html), a long time ago. – andrewJames Feb 14 '20 at 22:42