0

I am testing the following code on my own local machines (both on Archlinux and on Ubuntu 16.04 using nvidia driver 390 and cuda 9.1) and on our local HPC clusters:

#include <iostream>
#include <cufft.h>

int main(){
    // Initializing variables
    int n = 1024;
    cufftHandle plan1d;
    double2 *h_a, *d_a;

    // Allocation / definitions
    h_a = (double2 *)malloc(sizeof(double2)*n);
    for (int i = 0; i < n; ++i){
        h_a[i].x = sin(2*M_PI*i/n);
        h_a[i].y = 0;
    }

    cudaMalloc(&d_a, sizeof(double2)*n);
    cudaMemcpy(d_a, h_a, sizeof(double2)*n, cudaMemcpyHostToDevice);
    cufftResult result = cufftPlan1d(&plan1d, n, CUFFT_Z2Z, 1);

    // ignoring full error checking for readability
    if (result == CUFFT_INVALID_DEVICE){
        std::cout << "Invalid Device Error\n";
        exit(1);
    }

    // Executing FFT
    cufftExecZ2Z(plan1d, d_a, d_a, CUFFT_FORWARD);

    //Executing the iFFT
    cufftExecZ2Z(plan1d, d_a, d_a, CUFFT_INVERSE);

    // Copying back
    cudaMemcpy(h_a, d_a, sizeof(double2)*n, cudaMemcpyDeviceToHost);

 }

I compile with nvcc cuda_test.cu -lcufft

On both of my local machines, the code works just fine; however, I have tried using the same code on our HPC clusters and it will return the CUFFT_INVALID_DEVICE error on that hardware / configuration. Here's the hardware and driver configuration for those devices.

  • For one cluster, we have several P100's available and are using nvidia driver version 384.90 with cuda version 8.0.61.
  • On the second cluster, we are using K80's with nvidia driver version 367.44 and cuda version 8.0.44. As a note, when running the code with cuda version 7.5.18 on this hardware, the above code will still return an error, but this will not actually affect the execution of the code (so far as I am aware).

According to this, the cuda versions should be fine with the driver versions available; however, I receive a similar error when I had my drivers and cuda installations incorrect on my local ubuntu machine before.

I am completely baffled at how to continue here and can only think of a few things:

  1. There is some difference between the consumer hardware I am using on my local machines (Titan X, pascal and GTX 970) and the cluster HPC hardware.
  2. There is some driver configuration problem that I have not considered. I did what I could to try out different cuda versions, but none of them seemed to work, except for 7.5.18, which returned the same error, but did not seem to affect performance.
  3. There is some change to cufft after cuda 7.5.18 that I was not made aware of.

As a note: this is just an example, but I have a larger codebase that does not seem to run due to this error and I am trying to figure out how to solve that issue currently.

Thanks for reading and let me know if you have any ideas on how to proceed!

EDIT -- added a comment and fixed a typo in main code, after Rob's comment.

Leios
  • 1
  • 2
  • 1
    1. The return type of `cufftPlan1d` is not `int`. 2. Use proper CUDA error checking. 3. run your code with `cuda-memcheck` – Robert Crovella Feb 26 '18 at 09:08
  • Hey Rob 1. I apologize for the typo, `int` and `cufftResult` are functionally identical in this case (so far as I can tell) 2. I neglected error checking for this example for readability because writing something like this: https://devtalk.nvidia.com/default/topic/542160/cufft-error-handling/?offset=2 would make the code hard to read and isn't functionally different than what I wrote 3. cuda-memcheck doesn't return any errors, cufft does. Thanks! – Leios Feb 26 '18 at 22:00
  • http://docs.nvidia.com/cuda/cufft/index.html#function-cufftplan1d -- the documentation says cufftPlan1d can't return CUFFT_INVALID_DEVICE. Either something is severely broken, you have found a bug, or your example in the question isn't actually what you are running. No way of telling which – talonmies Feb 27 '18 at 11:16
  • Just above there, it shows result 11 as CUFFT_INVALID_DEVICE: http://docs.nvidia.com/cuda/cufft/index.html#cufftresult I am definitely running the example code, no question there. I have posted this on the nvidia forums to see if anyone there knows. – Leios Feb 28 '18 at 04:03

1 Answers1

1

I have had a similar problem, and it turned out to be a conflict between the Cray wrappers and the cuda toolkit. Not loading the cudatoolkit module, enabling dynamic linking and using the compiler-provided libraries solved the problem.

PS: I am using PGI Fortran 17.5, so not an exact match.