CUDA global function not called

Question

I'm trying to compile simple helloworld example copied from here. I'm using CentOS 6.4 environment.

// This is the REAL "hello world" for CUDA!
// It takes the string "Hello ", prints it, then passes it to CUDA with an array
// of offsets. Then the offsets are added in parallel to produce the string "World!"
// By Ingemar Ragnemalm 2010

#include <stdio.h>

const int N = 16; 
const int blocksize = 16; 

__global__ 
void hello(char *a, int *b) 
{
    a[threadIdx.x] += b[threadIdx.x];
}

int main()
{
    char a[N] = "Hello \0\0\0\0\0\0";
    int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

    char *ad;
    int *bd;
    const int csize = N*sizeof(char);
    const int isize = N*sizeof(int);

    printf("%s", a);

    cudaMalloc( (void**)&ad, csize ); 
    cudaMalloc( (void**)&bd, isize ); 
    cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice ); 
    cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice ); 

    dim3 dimBlock( blocksize, 1 );
    dim3 dimGrid( 1, 1 );
    hello<<<dimGrid, dimBlock>>>(ad, bd);
    cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost ); 
    cudaFree( ad );
    cudaFree( bd );

    printf("%s\n", a);
    return EXIT_SUCCESS;
}

Trying to compile it works fine:

$ nvcc hello_world.cu -o hello_world.bin

But when I run it:

$ ./hello_world.bin
Hello Hello

It doesn't print the expected 'Hello World', but instead 'Hello Hello'. If I comment some code out from the __global__ function there is no impact at all, or even adding printf into the hello() function does not result in anything. It seems the function isn't called. What am I missing? What can I check?

I have also tried some other example source codes, which work on another box. The problem seems to be the same, so something isn't right on this computer.

Edit:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Wed_Jul_17_18:36:13_PDT_2013
Cuda compilation tools, release 5.5, V5.5.0
$ nvidia-smi -a
-bash: nvidia-smi: command not found
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  319.60  Wed Sep 25 14:28:26 PDT 2013
GCC version:  gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC)
$ dmesg | grep NVRM
NVRM: loading NVIDIA UNIX x86_64 Kernel Module  319.60  Wed Sep 25 14:28:26 PDT 2013
NVRM: loading NVIDIA UNIX x86_64 Kernel Module  319.60  Wed Sep 25 14:28:26 PDT 2013

The code works fine for me. There is likely something wrong with your machine setup. Add [proper cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) to your code, and you'll likely get an idea of what is wrong. — Robert Crovella, Feb 24 '14 at 14:56
It seems to me that there are 11 characters in the "a" array while 16 integers in the int array.. isn't he copying from the host by accessing uninitialized memory? — Marco A., Feb 24 '14 at 15:01
@RobertCrovella thank you. Added that all over the place and now getting `GPUassert: CUDA driver version is insufficient for CUDA runtime version hello_world.cu` — eis, Feb 24 '14 at 15:03
What is the driver version and runtime version? You can get the runtime version easily enough with `nvcc --version`. You can get the driver version by running `nvidia-smi -a` — Robert Crovella, Feb 24 '14 at 15:06
@RobertCrovella added, any other way to get the driver version? — eis, Feb 24 '14 at 15:08
@RobertCrovella also: isn't the example incorrect? That memory is not guaranteed to be initialized to zero... — Marco A., Feb 24 '14 at 15:08
@DavidKernin might be, but the problem seems to be irrelevant of test code, I've tried multitude of examples — eis, Feb 24 '14 at 15:09
Okay, just wanted to make sure. nvidia-smi on a linux machine should be available if a graphics driver is installed.. did you install a driver at all? — Marco A., Feb 24 '14 at 15:10
@DavidKernin I have prebuilt CUDA software working fine, and the CUDA examples in the sdk work as well. — eis, Feb 24 '14 at 15:12
Yes, there are other ways to get the driver version. But if `nvidia-smi -a` is not giving you satisfactory results, it means your driver is not installed correctly, and you may as well start diagnosing things there. Another method would be `dmesg |grep NVRM` and I'm sure there are other ways as well. — Robert Crovella, Feb 24 '14 at 15:12
perhaps they were compiled with a previous version of CUDA.. — Marco A., Feb 24 '14 at 15:13
I've now added some conf output into the question, the driver seems to be 319.60. — eis, Feb 24 '14 at 15:13
@DavidKernin not sure what you are talking about. The arrays `a` and `b` are declared with a size of N, so they are both length 16. The `a` array only has 12 of those 16 chars initialized. So what? In C, a null-terminated string only matters up to the first null, for printf purposes. I don't see a problem with the code. — Robert Crovella, Feb 24 '14 at 15:14
It's fine for the output, although threads are dealing with uninitialized memory and if someone modifies this "hello world" sample, could have some problems. By the way you're right, for the output it's all fine — Marco A., Feb 24 '14 at 15:17
If `nvidia-smi` is not working, your machine is configured incorrectly. The driver installs that program in `/usr/bin` so it may be something like a `PATH` issue or something more complicated. Anyway that is the source of your problem. You say that the CUDA examples work well. There must be something different about how you are executing those. — Robert Crovella, Feb 24 '14 at 15:18
Ok, found out the issue now. I had paths in my .bash_profile set to refer some CUDA libraries directly, even though the environment was already fully configured. Removed the .bash_profile and everything works. Thank you. — eis, Feb 24 '14 at 15:32
Please explain what you did and provide it as an answer, so we can call this question answered. It's OK to answer your own question. — Robert Crovella, Feb 24 '14 at 15:35

eis · Accepted Answer · 2020-11-24T10:59:27.887

Thanks to advise from @RobertCrovella, I added return value checks all over my code:

#include <stdio.h>

const int N = 16;
const int blocksize = 16;

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
   if (code != cudaSuccess)
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}
__global__
void hello(char *a, int *b)
{
    a[threadIdx.x] += b[threadIdx.x];
}

int main()
{
        char a[N] = "Hello \0\0\0\0\0\0";
        int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

        char *ad;
        int *bd;
        const int csize = N*sizeof(char);
        const int isize = N*sizeof(int);

        printf("%s", a);

        gpuErrchk(cudaMalloc( (void**)&ad, csize ));
        gpuErrchk(cudaMalloc( (void**)&bd, isize ));
        gpuErrchk(cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice ));
        gpuErrchk(cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice ));

        dim3 dimBlock( blocksize, 1 );
        dim3 dimGrid( 1, 1 );
        hello<<<dimGrid, dimBlock>>>(ad, bd);
        gpuErrchk( cudaPeekAtLastError() );
        gpuErrchk( cudaDeviceSynchronize() );
        gpuErrchk(cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost ));
        gpuErrchk(cudaFree( ad ));
        gpuErrchk(cudaFree( bd ));

        printf("%s\n", a);
        return EXIT_SUCCESS;
}

This lead to discovery of this error when running the code:

$ nvcc hello_world.cu -o hello_world.bin
$ ./hello_world.bin
GPUassert: CUDA driver version is insufficient for CUDA runtime version hello_world.cu 39

I was running this on a cloud provider which did the setup of CUDA environment, so I suspected something was wrong in env I had done after that. In my environment, cuda env is set up by using

module load cuda55/toolkit/5.5.22

which should set up the environment fully. This was something I did not know at first, so before using that, I had tried to set up some paths myself. Due to that this was in my .bash_profile:

export CUDA_INSTALL_PATH=/cm/shared/apps/cuda55/toolkit/current
export PATH=$PATH:$CUDA_INSTALL_PATH/bin
export LD_LIBRARY_PATH=$CUDA_INSTALL_PATH/lib64
export PATH=$PATH:$CUDA_INSTALL_PATH/lib

Once I removed the stuff I had added to my .bash_profile and did a logout/login, everything started to work without issues.

CUDA __global__ function not called

1 Answers1

CUDA global function not called