I am trying to measure speedup for different number of threads of the parallel code, where speedup is the ratio of the compute time for the sequential algorithm to the time for the parallel algorithm. I am using OpenMP with FFTW in C++ and using the function omp_get_wtime()
to calculate the parallel time and clock()
to measure the sequential time. Originally, I was computing the speedup by dividing the parallel time of 1 thread to parallel time of the other different threads since parallel time at 1 thread = sequential time. However, I noticed that the sequential time changes with changing number of threads and now I am not sure how to actually compute my speed up.
Example:
static const int nx = 128;
static const int ny = 128;
static const int nz = 128;
double start_time, run_time;
int nThreads = 1;
fftw_complex *input_array;
input_array = (fftw_complex*) fftw_malloc((nx*ny*nz) * sizeof(fftw_complex));
memcpy(input_array, Re.data(), (nx*ny*nz) * sizeof(fftw_complex));
fftw_complex *output_array;
output_array = (fftw_complex*) fftw_malloc((nx*ny*nz) * sizeof(fftw_complex));
start_time = omp_get_wtime();
clock_t start_time1 = clock();
fftw_init_threads();
fftw_plan_with_nthreads(nThreads); //omp_get_max_threads()
fftw_plan forward = fftw_plan_dft_3d(nx, ny, nz, input_array, output_array, FFTW_FORWARD, FFTW_ESTIMATE);
fftw_execute(forward);
fftw_destroy_plan(forward);
fftw_cleanup();
run_time = omp_get_wtime() - start_time;
clock_t end1 = clock();
cout << " Parallel Time in s: " << run_time << "s\n";
cout << "Serial Time in s: " << (double)(end1-start_time1) / CLOCKS_PER_SEC << "s\n";
memcpy(Im.data(),output_array, (nx*ny*nz) * sizeof(fftw_complex));
fftw_free(input_array);
fftw_free(output_array);
Results of the above code are the following:
For 1 thread:
Parallel Time in s: 0.0231161s
Serial Time in s: 0.023115s
Gives a speedup = 1 which makes sense
For 2 threads (with ~ 2x speedup):
Parallel Time in s: 0.0132717s
Serial Time in s: 0.025434s
and so on. So, the question is why is the serial time increasing with the number of threads? Or am I supposed to measure the speedup using only omp_get_wtime()
with 1 thread treated as my sequential time. I am pretty confused about the speedup/performance of my above code it's either 5/6 times as fast (equal to number of cores on my computer) or only twice as fast depending on how I calculate the sequential time.