Is your load I/O bound? I/O bound means the CPU waits most of the time for I/O operations being done. Adding more threads means, sending more requests to the I/O subsystem or a remote server, etc. This may have positive effects, because requests to storage can be reordered and combined (scatter gather), but only until you reach the maximum possible I/O bandwidth. Adding more threads may also have adverse effects, e.g. when more random I/O requests are executed on a conventional harddisk.
If your load is I/O bound you can do various approaches to optimize the I/O operations. My first choice is to load data in greater chunks and in a more streaming manner, if this is possible. The next thing is to use external index structures or databases if you have lots of point accesses or more disks, if just bandwidth is missing. Anyway, optimizing I/O is another broad topic...
Is your load CPU bound? This means for processing the CPU power is the limiting factor, not the I/O bandwith. Optimizing you I/O subsystem makes no sense in this case, you need more or faster CPUs and you need to distribute the load.
In your particular case, you can load all data into memory, then your load is solely CPU bound. For CPU bound loads it is the best to use a thread count identical to the number of CPU cores in your machine. Choosing the number of CPUs as thread count is rather straight forward and obvious. It also is discussed in the question Optimal number of threads per core.
In practice, to execute your tasks in the Callable objects use an ExecutorService constructed that way:
int maxThreadCount = Runtime.getRuntime().availableProcessors();
ExecutorService executor =
new ThreadPoolExecutor(
0, maxThreadCount - 1,
1, TimeUnit.SECONDS,
new LinkedBlockingDeque<>(maxThreadCount * 2),
Executors.defaultThreadFactory(),
new ThreadPoolExecutor.CallerRunsPolicy());
Now do the processing by adding your tasks and wait until everything is finished:
while (moreToDo) {
Callable c =...
executor.submit(c);
}
executor.shutdown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
The thread pool parameters are a little tricky. Here is a detailed explaination:
By using new ThreadPoolExecutor.CallerRunsPolicy()
the task generator thread will stall generating new tasks when all threads in the pool are in use. To be more precise, the calling thread will execute a task as well, when the queue limit is reached.
maxThreadCount - 1
: Since we also use the caller thread thread pool size is reduced by one.
new LinkedBlockingDeque<>(maxThreadCount * 2)
: For the queue size of the blocking queue a small value is chosen, the idea is, that by having some tasks in the queue, the pool threads get new jobs while the caller thread is processing a job itself. If tasks are very irregular in running time, this is not totally perfect. The ThreadPoolExecutor
should have a cleaner approach for this use case. The better approach would be to use a SnychronosQueue
and to make the submit wait until a thread is available. However,
the ThreadPoolExecutor
has no "always queue" operation mode, instead, it tries to queue and calls the RejectionPolicy if the queue is not possible right now.
This should do it in your scenario.
There may be loads when you don't know in advance whether it is CPU bound or I/O bound, and, to complicate things, the load may change its behavior within processing. My idea to tackle this, is with an adaptive algorithm similar to the approach in TCP congestion avoidance algorithm. The congestion avoidance in TCP is exactly the same sort of problem: "I want maximum throughput, but I don't know my resources". Anybody worked on this?