I researched the usage of FFTW for a long time, and in my program, the multithreading FFTW with fftw_init_threads()
shows slower performance than the single thread version, so, I want to ask for help, please! The following is my program, can you find the problem:
int n1 = 1024, n2 = 1024;
//int nthreads = omp_get_num_procs();
double *x = (double*) fftw_malloc(sizeof(double) * n1 * n2);
fftw_complex *y = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * n1 * (n2 / 2 + 1));
double *z = (double*) fftw_malloc(sizeof(double) * n1 * n2);
int i, j;
clock_t start, end;
double cpu_time_used;
for (i = 0; i < n1; i++) {
for (j = 0; j < n2; j++) {
x[i * n2 + j] = rand() / (double) RAND_MAX;
}
}
fftw_init_threads();
fftw_plan_with_nthreads(1);
//omp_set_num_threads(2);
//omp_init();
r2c_plan = fftw_plan_dft_r2c_2d(n1, n2, x, y, FFTW_ESTIMATE);
c2r_plan = fftw_plan_dft_c2r_2d(n1, n2, y, z, FFTW_ESTIMATE);
start = clock();
//#pragma omp parallel for
for(i = 0; i < 1000; i++){
fftw_execute(r2c_plan);
fftw_execute(c2r_plan);
}
end = clock();
cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
printf(cpu_time_used);
fftw_free(x);
fftw_free(y);
fftw_free(z);
fftw_destroy_plan(r2c_plan);
fftw_destroy_plan(c2r_plan);
fftw_cleanup_threads();
return 0;
When changing the variable of the function "_plan_with_nthreads(int nthreads)
, does the performance of the program improve compared to the single-thread version? Does that mean the velocity is faster?