41

I wrote code using Java 8 streams and parallel streams for the same functionality with a custom collector to perform an aggregation function. When I see CPU usage using htop, it shows all CPU cores being used for both 'streams' and 'parallel streams' version. So, it seems when list.stream() is used, it also uses all CPUs. Here, what is the precise difference between parallelStream() and stream() in terms of usage of multi-core.

BuZZ-dEE
  • 6,075
  • 12
  • 66
  • 96
Yogi Joshi
  • 786
  • 1
  • 6
  • 19
  • 7
    Non-parallel streams use just one thread to process their pipeline. That is a hard fact. Unless you do some explicit multithreading with stream processing, then any given terminal operation will execute on a single core at a time. If you refer to the fact that htop shows _some_ utilization of all cores, that may just be due to the same thread migrating from core to core (not being pinned to a single core). – Marko Topolnik Aug 02 '15 at 13:31
  • 2
    It would be better if you provide the code of your program so we can reproduce your effect. As Marko said, `list.stream()` works sequentially in the same thread where terminal operation was issued, that's 100% fact. However we cannot explain why you observed all CPU utilization, because we don't see your code. – Tagir Valeev Aug 02 '15 at 16:51
  • Please find the code here - https://github.com/yogirjoshi/monitortools/blob/master/src/main/java/rithm/driver/Hypothesis2.java – Yogi Joshi Aug 02 '15 at 20:57
  • I think this answer may help you:https://stackoverflow.com/questions/23170832/java-8s-streams-why-parallel-stream-is-slower – Harry_T Feb 13 '19 at 08:16

2 Answers2

60

Consider the following program:

import java.util.ArrayList;
import java.util.List;

public class Foo {
    public static void main(String... args) {
        List<Integer> list = new ArrayList<>();
        for (int i = 0; i < 1000; i++) {
            list.add(i);
        }
        list.stream().forEach(System.out::println);
    }
}

You will notice that this program will output the numbers from 0 to 999 sequentially, in the order in which they are in the list. If we change stream() to parallelStream() this is not the case anymore (at least on my computer): all number are written, but in a different order. So, apparently, parallelStream() indeed uses multiple threads.

The htop is explained by the fact that even single-threaded applications are divided over mutliple cores by most modern operating systems (parts of the same thread may run on several cores, but of course not at the same time). So if you see that a process used more than one core, this does not mean necessarily that the program uses multiple threads.

Also the performance may not improve when using multiple threads. The cost of synchronization may nihilite the gains of using multiple threads. For simple testing scenarios this is often the case. For example, in the above example, System.out is synchronized. So, effectively, only number can be written at the same time, although multiple threads are used.

MDaniyal
  • 1,097
  • 3
  • 13
  • 29
Hoopje
  • 12,677
  • 8
  • 34
  • 50
3

adding to @Hoopje 's answer:

Before using parallelStream (), Read this:

  1. It is multi-threaded. Just writing parallelStream() to get parallelism is almost always bad idea in java. There are some cases where it will work, but not always. There are other ways to achieve parallelism and almost always, you need to think a lot before taking a multi-thread solution .
  2. It uses the default JVM thread pool. So, if you are doing any blocking operation such as network call, the entire java application can get stuck. Thats the biggest problem there. There are other ones with task allocation as well. A simple ExecutionService with n threads provides better performance that parallel streams.

You can also read: Java Parallel Streams Are Bad for Your Health! | JRebel by Perforce

HIMANSHU GOYAL
  • 471
  • 2
  • 11