-4

I'm getting much worse performance when I use list.parallelStream() than when I use list.stream(). Why do you think this is happening? This is Java 17 BTW, and my CPU is i5 of desktop class.

import java.text.SimpleDateFormat;
import java.time.LocalDate;
import java.time.Period;
import java.time.chrono.ChronoPeriod;
import java.time.chrono.HijrahChronology;
import java.time.chrono.HijrahDate;
import java.time.chrono.IsoChronology;
import java.time.format.DateTimeFormatter;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Locale;
import java.util.Random;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class App {
    public static void main(String[] args) throws Exception {
        int size = 1;
        List<Integer> list = null;
        long startTimeN;
        long endTimeN;
        long startTimeP;
        long endTimeP;
        long normalStreamCheckedSize;
        long normalStreamTime;
        long parallelStreamCheckedSize;
        long parallelStreamTime;

        for (int i = 1; i <= 1_000_000; i *= 10) {
            Random rand = new Random();
            size = i;
            list = Stream.generate(() -> {
                return rand.nextInt(10);
            }).limit(size).collect(Collectors.toList());

            startTimeN = System.currentTimeMillis();
            normalStreamCheckedSize = list.stream().count();
            endTimeN = System.currentTimeMillis();
            normalStreamTime = endTimeN - startTimeN;

            startTimeP = System.currentTimeMillis();
            parallelStreamCheckedSize = ist.parallelStream().count();
            endTimeP = System.currentTimeMillis();
            parallelStreamTime = endTimeP - startTimeP;

            System.out.println("Size: " + size);
            System.out.println("Normal time:" + normalStreamTime);
            System.out.println("Parallel time:" + parallelStreamTime 
            + "\n=====");
        }
   }
}
  • 3
    "Why do you think this is happening?" => Do you expect better performance? Why? Let's start with that assumption first. – ernest_k May 24 '22 at 08:15
  • 2
    Calling `distinct()` is a **very** expensive operation, especially when having parallel processing. – luk2302 May 24 '22 at 08:16
  • @ernest_k .parallelStream should leverage muti core, shouldn't it? – Abdulaziz Almalki May 24 '22 at 08:27
  • @luk2302 but I'm calling it with both. – Abdulaziz Almalki May 24 '22 at 08:27
  • 3
    I wrote: ... **especially when having parallel processing.**. – luk2302 May 24 '22 at 08:28
  • 2
    https://stackoverflow.com/questions/23170832/java-8s-streams-why-parallel-stream-is-slower – luk2302 May 24 '22 at 08:28
  • 1
    @luk2302 thanks for the link. I don't understand the downvote though, this is just a question! – Abdulaziz Almalki May 24 '22 at 09:06
  • 4
    Benchmarking in Java is pretty complex. `currentTimeMillis()` is a bad choice for measuring time `nanoTime()` would the the correct choice. Please read https://stackoverflow.com/a/38154825/150978 BTW: Posting code as screenshot is also a bad choice. How do you expect someone to reproduce your timings if the code is not available (that is my reason to downvote)? – Robert May 24 '22 at 11:28
  • 2
    Considering the recognizable trend in the results, you probably shouldn’t have stopped at `1_000_000` elements. Try `10_000_000`, `100_000_000`, and `1_000_000_000`… – Holger May 25 '22 at 10:21
  • @Robert I actually agree with you on the screenshot thing, changed to snippet. – Abdulaziz Almalki May 25 '22 at 13:25

1 Answers1

6

The assumption that a parallel computation will always be faster than a serial one is dangerously wrong. (The benchmarking methodology used in the example is also deeply problematic.)

A parallel solution will always involve more work than a serial one; in addition to doing the work of solving the problem, there is also task splitting, merging, dispatch, etc. Parallelism hopes to use more cores to get to an answer faster, but it's not magic performance dust, it requires experience and understanding to know when to expect a speedup.

Certain operations parallelize better than others; the distinct operation is among the most challenging (and even more so when you have an ordered stream, as you do.)

This talk:

https://www.youtube.com/watch?v=NsDE7E8sIdQ

outlines some of the considerations you'll need to master in order to use parallel streams effectively, and will explain why your expectations here are unfounded.

Streams can make it easy for us to access parallelism, but they don't absolve us of the hard work of figuring out when we should use it. You should default to sequential until you have more evidence that (a) the problem you are solving actually needs it, and (b) parallelism will actually help.

There is more detail here: Java 8's streams: why parallel stream is slower?

Brian Goetz
  • 90,105
  • 23
  • 150
  • 161