1

I was trying the code to check the time taken by stream() and parallelStream() for the same operation.

Code 1

List <String> list = new ArrayList < String > ();

for (int i = 1; i <= 30000; i++)
list.add("a");
for (int i = 1; i <= 20000; i++)
list.add("b");
for (int i = 1; i <= 10000; i++)
list.add("c");

//part 1
long start = System.currentTimeMillis();
Map <String, Long> countListSequence = list.stream()
    .collect(Collectors.groupingBy(e -> e, Collectors.counting()));
long end = System.currentTimeMillis();

System.out.println("Time taken in by stream() " + (end - start) + " millisec data " + countListSequence);

//part 2
long start1 = System.currentTimeMillis();
Map <String, Long> countListparallel = list.parallelStream()
    .collect(Collectors.groupingBy(e -> e, Collectors.counting()));
long end1 = System.currentTimeMillis();

System.out.println("Time taken by parallelStream() " + (end1 - start1) + " millisec data " + countListparallel);

Output 1

Time taken in by stream() 109 millisec data {a=30000, b=20000, c=10000}
Time taken by parallelStream() 16 millisec data {a=30000, b=20000, c=10000}

But if I changed the order first use parallelStream() then stream() like

Code 2

//part 1
long start1 = System.currentTimeMillis();
Map <String, Long> countListparallel = list.parallelStream()
    .collect(Collectors.groupingBy(e -> e, Collectors.counting()));
long end1 = System.currentTimeMillis();

System.out.println("Time taken by parallelStream() " + (end1 - start1) + " millisec data " + countListparallel);

//part 2
long start = System.currentTimeMillis();
Map <String, Long> countListSequence = list.stream()
    .collect(Collectors.groupingBy(e -> e, Collectors.counting()));
long end = System.currentTimeMillis();

System.out.println("Time taken in by stream() " + (end - start) + " millisec data " + countListSequence);

Output 2

Time taken by parallelStream() 109 millisec data {a=30000, b=20000, c=10000}
Time taken in by stream() 15 millisec data {a=30000, b=20000, c=10000}

My question is that why the second part stream() in Code 2 is taking less time then parallelStream() showing different behavior from Code 1?

Not only in the case of stream() and parallelStream() I also tried the same situation with stream() and stream() .I got the same situation with that also, second stream is taking less than first stream.

Code 3

//part 1
long start1 = System.currentTimeMillis();
Map <String, Long> countListparallel = list.stream()
    .collect(Collectors.groupingBy(e -> e, Collectors.counting()));
long end1 = System.currentTimeMillis();

System.out.println("Time taken by stream() 1 " + (end1 - start1) + " millisec data " + countListparallel);


//part 2    
long start = System.currentTimeMillis();
Map <String, Long> countListSequence = list.stream()
    .collect(Collectors.groupingBy(e -> e, Collectors.counting()));
long end = System.currentTimeMillis();

System.out.println("Time taken in by stream() " + (end - start) + " millisec data " + countListSequence);

Output 3

Time taken by stream() 1 107 millisec data {a=30000, b=20000, c=10000}
Time taken in by stream() 14 millisec data {a=30000, b=20000, c=10000}

Second print showed less time then first.So does stream calculated data reuse itself if I would believe that how could it be possible because the object created countListSequence and countListparallel are different.I am confused here how the second part in each code sample take less time then first part. Am I miising something about streams here?

Thanks

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
singhakash
  • 7,891
  • 6
  • 31
  • 65
  • 3
    That's probably due to the first invocation of lambdas involved: it takes time. Create a proper benchmark with JMH and compare the results. – Tunaki Nov 06 '15 at 18:55

1 Answers1

3

This is not at all surprising because the JIT compiler in the JVM optimizes code that has been run a significant number of times. As a result, it's perfectly normal for code later in the program to run faster than code earlier in the program.

If you want actual useful data here, write a benchmark with a tool like JMH that accounts for JIT warmup.

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413