0

I know a similar question has been asked before, but I'm having trouble with this. Here is the code I have written:

void fft(const double *indata_real, const double *indata_imag, double *outdata_real, double *outdata_imag, int x, int y)
{
  int size = sizeof(cufftDoubleComplex)*x*y;

  // allocate data on host
  cufftDoubleComplex* host_data = (cufftDoubleComplex*)malloc(size);
  for (int i = 0; i < x*y; ++i) {
    host_data[i].x = indata_real[i];
    host_data[i].y = indata_imag[i];
  }

  // allocate data on device
  cufftDoubleComplex* device_data;
  cudaMalloc((void**)&device_data, size);

  // copy data from host to device
  cudaMemcpy(device_data, host_data, size, cudaMemcpyHostToDevice);

  // create plan
  cufftHandle plan;
  cufftPlan2d(&plan, x, y, CUFFT_Z2Z);

  // perform transform
  cufftExecZ2Z(plan, (cufftDoubleComplex *)device_data, (cufftDoubleComplex *)device_data, CUFFT_FORWARD);

  // copy data back from device to host
  cudaMemcpy(host_data, device_data, size, cudaMemcpyDeviceToHost);

  // copy transform to outdata
  for (int i = 0; i < x*y; ++i) {
    outdata_real[i] = host_data[i].x;
    outdata_imag[i] = host_data[i].y;
  }

  // clean up
  cufftDestroy(plan);
  free(host_data);
  cudaFree(device_data);

}

The above works fine for single precision, i.e. when I replace all 'cufftDoubleComplex' with 'cufftComplex', replace 'CUFFT_Z2Z' with 'CUFFT_C2C', and replace 'cufftExecZ2Z' with cufftExecC2C

Based on what I found on that other page, I thought this would run fine with double precision. But at the moment the outdata arrays are the same as the indata arrays - it's not doing anything.

So if anyone can spot what I've done wrong that would be great!

S

Community
  • 1
  • 1
  • 1
    What GPU are you running this on? Are your checking for API errors (both CUDA and CUFFT)? Are any being reported? – talonmies Sep 10 '13 at 13:41
  • There don't seen to be any API errors - sorry but how would I go about explicitly checking for them? I'm actually calling this function from a python wrapper that I'm writing alongside it. The GPU is a GeForce GTS 250. – user2765038 Sep 10 '13 at 15:33

1 Answers1

4

The reason your code is producing no output is because nothing is running - your GPU is a compute 1.1 device which doesn't support double precision floating point operations.

You should be able to check for this by examining the return status of the cufftExecZ2Z call, which I would expect to return CUFFT_EXEC_FAILED because your GPU doesn't support double precision and the double precision FFT kernels won't launch.

talonmies
  • 70,661
  • 34
  • 192
  • 269