I am wondering what is going to happen if I execute parallel
programming code inside a multi-threaded program? e.g. using Java's
parallel Stream in a multi-threaded server program.
Based on my limited knowledge of Java runtime, every program is already multithreaded, the application entry point is the main
thread which runs along side other run time threads (gc
).
Suppose your application spawns two threads, and in one of those threads a parallelStream is created. It looks like the parallelStreams api use a ForkJoinPool.commonPool
which starts NUM_PROCESSES - 1 threads. At this point your application may have more threads than CPUs so if your parallelStream
computation is CPU bound than you're already oversubscribed on threads -> CPU.
https://stackoverflow.com/a/21172732/594589
I'm not familiar with java but it's interesting that parallelStream shares the same thread pool. So if your program spawned another thread and started another parallelStream, the second parallelStream would share the underlying thread pool threads with the first!
In my experiences I've found it's important to consider:
- The type of workload your application is performing (CPU vs IO)
- The type of concurrency primitives available (threads, processes, green threads, epoll aysyncio, etc)
- Your system resources (ie #CPU's available)
- How your applications concurrency primitives map to the underlying OS resources
- The # of concurrency primitives that your application has at any given time
Would the program actually be more efficient?
It completely depends, and the only for sure answer is to benchmark on your target architecture/system with the two solutions.
In my experiences reasoning about complex concurrency beyond basic patterns becomes much of a shot in the dark. I believe that this is where the saying:
Make it work, make it right, make it fast.
-- Kent Beck
comes from. In this case make sure that your program is concurrent safe (make it right) and free of deadlocks. And then begin testing, benchmarking and running experiments.
In my limited personal experiences I have found analysis to largely fall apart beyond characterizing your applications workload (CPU vs IO) and finding a way to model it so you can scale out to utilize your systems full resources in a configurable benchmark able way.