I'm trying to do a simple experiment where I want to find out what is the right size of a thread pool when you got a bunch of CPU intensive tasks.
I already know that this size should be equal to the number of cores on the machine, but I want to prove that empirically. Here is the code:
public class Main {
public static void main(String[] args) throws ExecutionException {
List<Future> futures = new ArrayList<>();
ExecutorService threadPool = Executors.newFixedThreadPool(4);
long startTime = System.currentTimeMillis();
for (int i = 0; i < 100; i++) {
futures.add(threadPool.submit(new CpuBoundTask()));
}
for (int i = 0; i < futures.size(); i++) {
futures.get(i).get();
}
long endTime = System.currentTimeMillis();
System.out.println("Time = " + (endTime - startTime));
threadPool.shutdown();
}
static class CpuBoundTask implements Runnable {
@Override
public void run() {
int a = 0;
for (int i = 0; i < 90000000; i++) {
a = (int) (a + Math.tan(a));
}
}
}
}
Each task executes in about 700 milliseconds (I think that's enough to be preempted by the ThreadScheduler at least once).
I'm running this on a MacbookPro 2017, 3.1 GHz Intel Core i5, 2 physical cores with hyperthreading activated, so 4 logical CPUs.
I adjusted the size of the threadpool, and I ran this program multiple times (averaging the timings). Here are the results:
1 thread = 57 seconds
2 threads = 29 seconds
4 threads = 18 seconds
8 threads = 18.1 seconds
16 threads = 18.2 seconds
32 threads = 17.8 seconds
64 threads = 18.2 seconds
I was expecting the execution time to be significantly higher, once I add so many threads (more than the number of CPU cores), because of the context switch overhead, but it seems that this doesn't really happen.
I used VisualVM to monitor the program, and looks like all the threads get created and they are in the running state, as expected. Also, the CPU seems to be used properly (close to 95%).
Is there something that I'm missing?