One important thing is that a ForkJoinPool
can execute "normal" tasks (e.g. Runnable
, Callable
) as well, so it's not just meant to be used with recursively-created tasks.
Another (important) thing is that ForkJoinPool
has multiple queues, one for each worker thread, for the tasks, where a normal executor (e.g. ThreadPoolExecutor
) has just one. This has much impact on what kind of tasks they should run.
The smaller and the more tasks a normal executor has to execute, the higher is the overhead of synchronization for distributing tasks to the workers. If most of the tasks are small, the workers will access the internal task queue often, which leads to synchronization overhead.
Here's where the ForkJoinPool
shines with its multiple queues. Every worker just takes tasks from its own queue, which doesn't need to be synchronized by blocking most of the time, and if it's empty, it can steal a task from another worker, but from the other end of the queue, which also leads rarely to synchronization overhead as work-stealing is supposed to be rather rare.
Now what does that have to do with parallel streams? The streams-framework is designed to be easy to use. Parallel streams are supposed to be used when you want to split something up in many concurrent tasks easily, where all tasks are rather small and simple. Here's the point where the ForkJoinPool
is the reasonable choice. It provides the better performance on huge numbers of smaller tasks and it can handle longer tasks as well, if it has to.