I'm trying to use OpenMP to parallelize quicksort (for arrays with distinct integers). I already have a working implementation (I tested the output for multiple threads and it seems to work). The problem is that I'm not seeing any speedup, even for large arrays which leads me to believe my implementation is not correct. Since the actual implementation is correct, but the parallelism doesn't seem to produce the desired speedup, I will show only the parallelization of the code:
void parallel_randomized_quicksort(vector<int>& A, int start, int end){
if ((end-start) is too small){
run a serial sorting algorithm
}else{
pick a random pivot x and partition A around that pivot.
k = index of x in A
#pragma omp parallel
{
#pragma omp single nowait
{
parallel_randomized_quicksort(A,start,k-1);
}
#pragma omp single nowait
{
parallel_randomized_quicksort(A,k+1,end);
}
}
}
}
If this is correct, there should be some speedup, since the left and right partitions can be recursed in parallel which is the goal of my code. Why don't I notice speedup?
EDIT: Execution time is measured using:
double start_time = omp_get_wtime();
parallel_randomized_quicksort(A,0,A.size()-1);
double time = omp_get_wtime() - start_time;
The arrays are distinct integers. Sizes of the arrays vary from 100 to 1,000,000. Times are in the milliseconds for the smaller arrays and a couple seconds for the larger ones. Typically if the array is less than size 32 I will use a simple serial algorithm like insertion sort.