I've tested some simple conditions:
Consider an int array with 10 000 000 in length. Filling:
- using single (main) thread.
- using double-worker threads and joining them until they finish. First one fills from the beginning until the middle of array. Second one from the end until the middle.
- using ExecutorService fixed pool (2), calling execute and waiting for termination.
- using ForkJoinPool with default amount of workers (amount of available processors)
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveAction;
import java.util.concurrent.TimeUnit;
public class PerformanceTest {
private static final int ARRAY_LENGTH = 10_000_000;
private static int[] array;
private static final int ITERATIONS = 10;
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < ITERATIONS; i++) {
array = new int[ARRAY_LENGTH];
long millis = System.currentTimeMillis();
singleWorkerFill();
System.out.println("Single worker: " + (System.currentTimeMillis() - millis));
array = new int[ARRAY_LENGTH];
millis = System.currentTimeMillis();
doubleWorkerFill();
System.out.println("Double worker: " + (System.currentTimeMillis() - millis));
array = new int[ARRAY_LENGTH];
millis = System.currentTimeMillis();
forkJoinWorkersFill();
System.out.println("Executor workers: " + (System.currentTimeMillis() - millis));
array = new int[ARRAY_LENGTH];
millis = System.currentTimeMillis();
executorWorkersFill();
System.out.println("ForkJoin workers: " + (System.currentTimeMillis() - millis));
System.out.println("---------------------------------------------");
Thread.sleep(1000);
}
}
private static void singleWorkerFill() {
for (int i = 0, len = array.length; i < len; i++) {
array[i] = i;
}
}
private static void doubleWorkerFill() throws InterruptedException {
Thread worker1 = new Thread(new HeadArrayFiller());
Thread worker2 = new Thread(new TailArrayFiller());
worker1.start();
worker2.start();
worker1.join();
worker2.join();
}
private static void executorWorkersFill() throws InterruptedException {
ExecutorService executorService = Executors.newFixedThreadPool(2);
executorService.execute(new HeadArrayFiller());
executorService.execute(new TailArrayFiller());
executorService.shutdown();
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
}
private static void forkJoinWorkersFill() throws InterruptedException {
ForkJoinPool pool = new ForkJoinPool();
pool.invoke(new HeadArrayFiller());
pool.invoke(new TailArrayFiller());
pool.shutdown();
pool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
}
private static class HeadArrayFiller extends RecursiveAction implements Runnable {
@Override
public void run() {
for (int i = 0, middle = array.length / 2; i <= middle; i++) {
array[i] = i;
}
}
@Override
protected void compute() {
run();
}
}
private static class TailArrayFiller extends RecursiveAction implements Runnable {
@Override
public void run() {
for (int i = array.length - 2, middle = array.length / 2; i > middle; i--) {
array[i] = i;
}
}
@Override
protected void compute() {
run();
}
}
}
I expected that single-threaded model has no chances vs. another ones, but it's not. Here's test results scaled in milliseconds:
ITERATION #1 Single worker: 7 Double worker: 10 Executor workers: 11 ForkJoin workers: 6 ITERATION #2 Single worker: 6 Double worker: 4 Executor workers: 5 ForkJoin workers: 4 ITERATION #3 Single worker: 4 Double worker: 4 Executor workers: 5 ForkJoin workers: 4 ITERATION #4 Single worker: 5 Double worker: 5 Executor workers: 5 ForkJoin workers: 4 ITERATION #5 Single worker: 5 Double worker: 5 Executor workers: 4 ForkJoin workers: 5 ITERATION #6 Single worker: 4 Double worker: 4 Executor workers: 5 ForkJoin workers: 4 ITERATION #7 Single worker: 4 Double worker: 4 Executor workers: 4 ForkJoin workers: 5 ITERATION #8 Single worker: 4 Double worker: 4 Executor workers: 4 ForkJoin workers: 5 ITERATION #9 Single worker: 4 Double worker: 4 Executor workers: 4 ForkJoin workers: 5 ITERATION #10 Single worker: 5 Double worker: 4 Executor workers: 4 ForkJoin workers: 4
As you noticed single-threaded model faster than multi-threaded double-worker at startup. Fork-join model seems to be the best likewise ExecutorService. I suggest there're some JIT compiler optimizations over iterations. They are all quite similar at the end of the test.
Anyway the main question is why double-threaded model performance is same as single-threaded (and even slower at the cold startup). And how can I reach performance near twice faster as expected?
Thanks