2

I ran below sample code in my PC running with Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz (2 CPUs), ~2.7GHz

    String format = "%7s run taken %6d micro seconds %5d findAny";

    // First run
    long start = System.nanoTime();
    int rand = IntStream.range(0, 100000).parallel().findAny().getAsInt();
    long end = System.nanoTime();
    System.out.println(String.format(format, "First", ((end - start) / 1000), rand));

    // Subsequent runs
    for (int i = 0; i < 25; i++) {
        start = System.nanoTime();
        rand = IntStream.range(0, 100000).parallel().findAny().getAsInt();
        end = System.nanoTime();
        System.out.println(String.format(format, "Subseq", ((end - start) / 1000), rand));
    }

its output

  First run taken  92532 micro seconds 50000 findAny
 Subseq run taken     61 micro seconds 50000 findAny
 Subseq run taken     37 micro seconds 50000 findAny
 Subseq run taken     52 micro seconds 50000 findAny
 Subseq run taken     42 micro seconds 50000 findAny
 Subseq run taken     33 micro seconds 50000 findAny
 Subseq run taken     32 micro seconds 50000 findAny
 Subseq run taken     34 micro seconds 50000 findAny
 Subseq run taken     33 micro seconds 50000 findAny
 Subseq run taken     34 micro seconds 50000 findAny
 Subseq run taken     32 micro seconds 50000 findAny
 Subseq run taken     32 micro seconds 50000 findAny
 Subseq run taken     46 micro seconds 50000 findAny
 Subseq run taken     36 micro seconds 50000 findAny
 Subseq run taken     31 micro seconds 50000 findAny
 Subseq run taken     43 micro seconds 50000 findAny
 Subseq run taken     34 micro seconds 50000 findAny
 Subseq run taken     31 micro seconds 50000 findAny
 Subseq run taken     32 micro seconds 50000 findAny
 Subseq run taken     37 micro seconds 50000 findAny
 Subseq run taken     45 micro seconds 50000 findAny
 Subseq run taken     49 micro seconds 50000 findAny
 Subseq run taken     32 micro seconds 50000 findAny
 Subseq run taken     31 micro seconds 50000 findAny
 Subseq run taken     31 micro seconds 50000 findAny
 Subseq run taken     37 micro seconds 50000 findAny

we could see the time taken difference between the first and subsequent runs.

  1. does it mean stream operations are cached? Is there any internal cache implemented for streams in Java8?
  2. sometimes findAny returns different value but the time taken is almost equal to the subsequent runs not like the first run

See below

  First run taken  84099 micro seconds 50000 findAny
 Subseq run taken    163 micro seconds 25000 findAny
 Subseq run taken     46 micro seconds 50000 findAny
 Subseq run taken     52 micro seconds 25000 findAny
Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Saravana
  • 12,647
  • 2
  • 39
  • 57

1 Answers1

1

does it mean stream operations are cached?

No, the code generated to implement the lambdas, and the classes loaded are cached.

Is there any internal cache implemented for streams in Java8?

There is no special cache for Streams.

sometimes findAny returns different value but the time taken is almost equal to the subsequent runs not like the first run

Indeed. Nothing about the result is cached. The first time you pay a penalty for loading the code.

BTW the coding isn't really optimised until it has been run at least 10,000 times. I would run this test repeatedly for around 10 seconds before timing it.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • I ran the same code with an interval of 10 seconds and the subsequent runs took 300-400 micro seconds – Saravana Oct 17 '16 at 11:12
  • @Saravana note you are getting to the point where using multiple threads is more overhead than useful. I would try the same test without parallel() for comparison. – Peter Lawrey Oct 17 '16 at 11:34
  • @Saravana BTW you need to call the same line of code, not code which is a copy of it. It doesn't optimise all code which looks the same, it has to actually be the same line. I suggest using a loop where you ignore the results for the first 20000 iterations and only print the results after than. – Peter Lawrey Oct 17 '16 at 11:36
  • 1
    I put the steaming code in a static block to have the classes pre loaded, after this both first and subsequent took almost same time, it seems class loading is the issue – Saravana Oct 17 '16 at 11:57
  • @Saravana class loading a big portion, but not the only source of delay. – Peter Lawrey Oct 17 '16 at 11:59
  • 3
    @Peter Lawrey: you are underestimating the optimizer a bit. If you perform the same stream operation at different call sites, still 99% of the executed code paths is the same and will get optimized. Of course, for an operation as simple as `range(…).findAny()`, it is impossible to have a benefit from parallel processing, regardless of the optimization state of the JVM, as there is no work to split. So, this code is like distributing a no-op to different threads and the total execution time is the time needed to receive a completion signal from all threads… – Holger Oct 17 '16 at 13:19
  • 3
    On the other hand, the sequential execution of the operation can get optimized to a real no-op. – Holger Oct 17 '16 at 13:21
  • @Holger I have found for simple operations, the code get heavily inlined and optimised at the call site. Warming up one block of code and testing another in the same method can have unintended consequences such as optimising the rest of the code in the method based on no execution information. – Peter Lawrey Oct 17 '16 at 13:22
  • 3
    Of course, this is far away from being a good benchmark. But here, the initial overhead from class loading, verification and initialization outweighs anything else. It was a different picture if there were lambda expressions involved… – Holger Oct 17 '16 at 13:50