I was experimented with multithreading and parallel programming. According to my current knowledge, for intensive cpu computations, the optimal thread number is equal to the number of cores, otherwise u'll have an overhead regarding context switch between threads. I am not involving hyperthreading in the game where you can double the logical processors. I tried to play with a parallel quicksort on a 4 core cpu with disabled hyperthreading.
This is a result set that made me a bit curious.
I am a bit confused reading numerous of answers why this can happen.
I can understand that 8 threads have no significant performance, but why they compute faster than 4 threads ?
CPU Cores : 4
10 tests ran with 1 thread(s), on an array of size : 30000000. Sorting process took on avg for each test : 3579
10 tests ran with 2 thread(s), on an array of size : 30000000. Sorting process took on avg for each test : 2458
10 tests ran with 4 thread(s), on an array of size : 30000000. Sorting process took on avg for each test : 1230
10 tests ran with 8 thread(s), on an array of size : 30000000. Sorting process took on avg for each test : 1030
10 tests ran with 16 thread(s), on an array of size : 30000000. Sorting process took on avg for each test : 1026
10 tests ran with 32 thread(s), on an array of size : 30000000. Sorting process took on avg for each test : 1036
10 tests ran with 64 thread(s), on an array of size : 30000000. Sorting process took on avg for each test : 1094
JMH results on 30million sample array with 10 iterations
Benchmark Mode Cnt Score Error Units
Sample.run16Threads avgt 10 1057.336 ± 38.845 ms/op
Sample.run1Thread avgt 10 3385.159 ± 15.619 ms/op
Sample.run2Threads avgt 10 1809.542 ± 23.794 ms/op
Sample.run32Threads avgt 10 1133.388 ± 86.976 ms/op
Sample.run4Threads avgt 10 1120.386 ± 113.688 ms/op
Sample.run64Threads avgt 10 1546.910 ± 488.938 ms/op
Sample.run8Threads avgt 10 1031.181 ± 52.650 ms/op
public class ParallelQuickSort extends RecursiveAction {
private final int left;
private final int right;
private final int[] arr;
public ParallelQuickSort(int left, int right, int[] arr) {
this.left = left;
this.right = right;
this.arr = arr;
}
private int partition(int[] arr, long pivot) {
int leftPtr = left - 1;
int rightPtr = right;
while ( true ) {
while ( arr[++leftPtr] < pivot ) ;
while ( rightPtr > 0 && arr[--rightPtr] > pivot ) ;
if ( leftPtr >= rightPtr ) {
break;
} else {
swap( arr, leftPtr, rightPtr );
}
}
swap( arr, leftPtr, right );
return leftPtr;
}
private void swap(int[] arr, int x, int y) {
int temp = arr[x];
arr[x] = arr[y];
arr[y] = temp;
}
@Override
protected void compute() {
if ( right - left <= 0 ) {
return;
}
long pivot = arr[right];
int partition = partition( arr, pivot );
ForkJoinTask.invokeAll( createSubTasks( partition ) );
}
private List<ParallelQuickSort> createSubTasks(int partition) {
return List.of(
new ParallelQuickSort( left, partition - 1, arr ),
new ParallelQuickSort( partition + 1, right, arr )
);
}
}
public class Main {
public static void main(String[] args) {
SecureRandom secureRandom = new SecureRandom();
int[] arr = IntStream.range( 0, 30_000_000 ).map( i -> secureRandom.nextInt( 200_000 ) ).toArray();
int tests = 10;
System.out.println( "CPU Cores : " + Runtime.getRuntime().availableProcessors() + "\n" );
for ( int threads = 1; threads <= 64; threads *= 2 ) {
long time = 0;
for ( int i = 0; i < tests; ++i ) {
int[] sample = Arrays.copyOfRange( arr, 0, arr.length );
ForkJoinPool forkJoinPool = new ForkJoinPool(threads);
long start = System.currentTimeMillis();
forkJoinPool.invoke( new ParallelQuickSort( 0, arr.length - 1, sample ) );
long end = System.currentTimeMillis();
forkJoinPool.shutdown();
time += end - start;
//System.out.println( isSorted( sample ) );
}
System.out.println( tests + " tests ran with " + threads +
" thread(s), on an array of size : " + arr.length + ". Sorting process took on avg for each test : " + time / tests );
}
}
private static boolean isSorted(int[] arr) {
boolean isSorted = true;
for ( int i = 0; i < arr.length - 1; ++i ) {
if ( arr[i] > arr[i + 1] ) {
isSorted = false;
break;
}
}
return isSorted;
}
}