2

I am trying to use managed variable in a CUDA program and I am getting a segmentation fault when trying to set the managed variable on the host side. I am doing exactly the same as mentioned in the documentation here(http://docs.nvidia.com/cuda/cuda-c-programming-guide/#managed-qualifier). Why is this happening?

#include <cuda.h>

#define THREADS_PER_BLOCK 32
#define BLOCKS_PER_SM 1

#define MB(x) ((x) << 20)

__device__ __managed__ int val = 0;

__global__ void test_kernel(char *src)
{
    src[0] = val;
}

int main(int argc, char *argv[])
{
    char *data;
    int size = 2; // 2 MB

    val = 100; //<--- seg fault gone if I comment this line

    cudaMallocManaged(&data, MB(size));
    test_kernel<<<BLOCKS_PER_SM, THREADS_PER_BLOCK>>>(data);
    cudaDeviceSynchronize();
    cudaFree(data);

    return 0;
}
pranith
  • 869
  • 9
  • 24
  • Pay attention to the restrictions in the documentation you linked to. The answer lies there – talonmies Nov 15 '14 at 16:46
  • The only restriction I think is related is the runtime initialization. I tried initializing the runtime early, but it still seg faults. Could you please be more specific? – pranith Nov 15 '14 at 16:48
  • 1
    Whenever you are having trouble with a CUDA code, it's a good idea to do [proper cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api). If you do that, and comment out the line that is causing the seg fault, you'll likely learn something. Have you also reviewed the UM [requirements](http://docs.nvidia.com/cuda/cuda-c-programming-guide/#um-requirements) ? Your code runs without a seg fault for me on a platform that properly supports UM. – Robert Crovella Nov 15 '14 at 17:44
  • 1
    The code compiles and runs fine for me (val==100, char[0]==0 at the end of the execution), using `arch=compute_30` and `code=sm_30`. You need ccc>=3.0 in order to use `managed... but sure you're aware of that. – srodrb Nov 15 '14 at 17:45
  • I checked that my system meets the minimum requirements. I also ran 0_Simple/UnifiedMemoryStreams in sample cuda code. My device is: "GeForce GTX 660 Ti" with compute capability 3.0. I also checked that I have the uvm driver installed: nvidia-340-uvm. What else can I check? – pranith Nov 15 '14 at 20:07

1 Answers1

0

Sorry for the noise guys. It was a big FAIL on my part. My device is a 3.0 capability device, but I was compiling for compute_50 which is not supported. Thanks for the suggestions!

pranith
  • 869
  • 9
  • 24