I am parallelizing an application in C# and am testing the performance difference between using implicit threading versus explicit threading. Both techniques utilize the System.Threading
library, and the implicit threading is characterized by using a Parallel.For
loop while the explicit threading involves creating, starting, and joining threads while also calculating chunk sizes, calling the worker function, etc.
I have found that I achieve better speed up over the original sequential version of the program by utilizing explicit threading (about 1.2x faster after 50 trials) on eight cores. I understand the underlying differences between these two techniques, however, I am not sure why the explicit version seems to be faster. I thought that perhaps the implicit version would be faster as tasks would be scheduled automatically, as opposed to manual task and thread creation. Would there be a reason (apart from perhaps an error in my results) that the explicit version would be faster?
For reference, a summarized version of the relevant code can be seen below.
float[][] stft_implicit(Complex[] x, int wSamp)
{
//...
Parallel.For(0, size, new ParallelOptions { MaxDegreeOfParallelism = MainWindow.NUM_THREADS }, ii =>
{
Complex[] tempFFT = IterativeFFT.FFT(all_temps[ii], twiddles, wSamp);
fft_results[ii] = tempFFT;
});
//...
}
float[][] stft_explicit(Complex[] x, int wSamp)
{
//...
length = (int)(2 * Math.Floor((double)N / (double)wSamp) - 1);
chunk_size = (length + MainWindow.NUM_THREADS - 1) / MainWindow.NUM_THREADS;
Thread[] threads = new Thread[MainWindow.NUM_THREADS];
for (int i = 0; i < MainWindow.NUM_THREADS; i++)
{
threads[i] = new Thread(fft_worker);
threads[i].Start(i);
}
for (int i = 0; i < MainWindow.NUM_THREADS; i++)
{
threads[i].Join();
}
//...
}
public void fft_worker(object thread_id)
{
int ID = (int)thread_id;
Complex[] temp = new Complex[wSamp];
Complex[] tempFFT = new Complex[wSamp];
int start = ID * chunk_size;
int end = Math.Min(start + chunk_size, length);
for (int ii = start; ii < end; ii++)
{
//...
tempFFT = IterativeFFT.FFT(temp, twiddles, wSamp);
//...
}
}