Cuda hello_world.cu compiles but wrongly prints "Hello Hello"

Question

I have found the following hello world program for CUDA:

#include <stdio.h>

#define cudaCheckErrors(msg) \
    do { \
        cudaError_t __err = cudaGetLastError(); \
        if (__err != cudaSuccess) { \
            fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
                msg, cudaGetErrorString(__err), \
                __FILE__, __LINE__); \
            fprintf(stderr, "*** FAILED - ABORTING\n"); \
            exit(1); \
        } \
    } while (0)


const int N = 16;
const int blocksize = 16;

__global__
void hello(char *a, int *b)
{
  a[threadIdx.x] += b[threadIdx.x];
}

int main()
{
  char a[N] = "Hello \0\0\0\0\0\0";
  int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

  char *ad;
  int *bd;
  const int csize = N*sizeof(char);
  const int isize = N*sizeof(int);

  printf("%s", a);

  cudaMalloc( (void**)&ad, csize );
  cudaMalloc( (void**)&bd, isize );
  cudaCheckErrors("cudaMalloc fail");
  cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice );
  cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice );
  cudaCheckErrors("cudaMemcpy H2D fail");

  dim3 dimBlock( blocksize, 1 );
  dim3 dimGrid( 1, 1 );
  hello<<<dimGrid, dimBlock>>>(ad, bd);
  cudaCheckErrors("Kernel fail");
  cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost );
  cudaCheckErrors("cudaMemcpy D2H/Kernel fail");
  cudaFree( ad );
  cudaFree( bd );

  printf("%s\n", a);
  return EXIT_SUCCESS;
}

I compile it successfully with nvcc hello_world.cu -o hello, but when I run cuda-memcheck ./hello , I get:

========= CUDA-MEMCHECK
Fatal error: cudaMalloc fail (unknown error at hello_world.cu:39)
*** FAILED - ABORTING
Hello ========= ERROR SUMMARY: 0 errors

I'm a CUDA newbie, my questions are:
1) what's going on under the hood?
2) how can I fix it?

I'm running Ubuntu 13.04, x86_64, Cuda 5.5, without root access.
the upper output of nvidia-smi is:

+------------------------------------------------------+                       
| NVIDIA-SMI 337.19     Driver Version: 337.19         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:05:00.0     N/A |                  N/A |
| 26%   37C  N/A     N/A /  N/A |     53MiB /  6143MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

When I run deviceQuery, I get:

../../bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

And when I run deviceQueryDrv, I get:

../../bin/x86_64/linux/release/deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version 
cuInit(0) returned 999
-> CUDA_ERROR_UNKNOWN
Result = FAIL

When I run:

#include <cublas_v2.h>
#include <cstdio>
int main()
{
  int res;
  cublasHandle_t handle;
  res = cublasCreate(&handle);
  switch(res) {
  case CUBLAS_STATUS_SUCCESS:
    printf("the initialization succeeded\n");
    break;
  case CUBLAS_STATUS_NOT_INITIALIZED:
    printf("the CUDA Runtime initialization failed\n");
    break;
  case CUBLAS_STATUS_ALLOC_FAILED:
    printf("the resources could not be allocated\n");
    break;
  }
  return 0;
}

I get the CUDA Runtime initialization failed.

There is nothing wrong with the program. The problem lies in your machine setup, and unfortunately these simple examples have no [cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api). Run your code with `cuda-memcheck`, and you'll get an idea that something is wrong with your machine. What is the result of running `nvidia-smi` ? Are you on windows or linux? What CUDA version is installed? — Robert Crovella, Jul 14 '14 at 23:20
what is the result of running your code with `cuda-memcheck`, i.e. `cuda-memcheck ./mycode` ? What CUDA version is installed? — Robert Crovella, Jul 14 '14 at 23:50
What CUDA version is installed? Add [proper cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) after the kernel call. What command line are you using to compile the code? What happens when you run the `deviceQuery` sample code? — Robert Crovella, Jul 15 '14 at 00:01
@RobertCrovella would you know of a simple program with proper cuda error checking that I can just compile and run? if you could point to url, would be great — Alexandre Holden Daly, Jul 15 '14 at 00:21
[Here](http://pastebin.com/ZGAKq3YR) is the same program with error checking added. Compile/run the same way. Is your linux distro 32-bit or 64-bit? — Robert Crovella, Jul 15 '14 at 00:38
I believe you have a corrupted machine, such as mismatched libraries, nvcc version, etc. This can be rather tedious to sort out via Q+A. If possible, I would start with a clean load of the OS, followed by the latest driver for the GTX Titan, followed by CUDA 6 using the runfile installer, selecting "no" when prompted to install the driver. If you want to keep playing ping-pong, then please provide the output of `nvcc --version`, `which nvcc`, `echo $PATH`, and `echo $LD_LIBRARY_PATH` — Robert Crovella, Jul 15 '14 at 01:05
@RobertCrovella I've tried to follow your reasoning and hopefully the latest edit will allow you to be conclusive — Alexandre Holden Daly, Jul 15 '14 at 02:12
It looks like you have CUDA 4.2 and 5.5 installed on top of each other (libraries are in the same directory). That is bad; the installer won't do that unless instructed to. It might be workable/fixable but unfortunately I can't see what the symbolic links point to (`ls -l` would have been better). And fixing it may be quite tedious. How about if you 1. delete `/usr/local/cuda` 2. reinstall cuda using the [runfile installer method](http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#runfile) 3. select "no" when prompted about the driver install. That should sort it out. — Robert Crovella, Jul 15 '14 at 02:37
@RobertCrovella I ran the same code on various machines (in my university), it failed on exactly the ones which run nvidia driver 337.19, succeeded on the ones which run nvidia driver 331.20. Does this info confirm your suspicion? I don't have root access, I'd like to copy /usr/local/cuda to home dir, and compile from there. — Alexandre Holden Daly, Jul 15 '14 at 03:18
No, it does not confirm my suspicions. Perhaps there is a problem with that driver (337.19) and CUDA 5.5. If you don't have root access, this discussion seems mostly moot. You could install CUDA 6 in your user directory, and modify your PATH and LD_LIBRARY_PATH accordingly, and see if it works with 337.19. You should not need root access for that. — Robert Crovella, Jul 15 '14 at 03:31
sys admin got it working: "I ran a CUDA job as root on both machines this morning. That is referenced n-line; a bug for driver version 337." — Alexandre Holden Daly, Jul 15 '14 at 09:31

Ishamael · Answer 1 · 2015-02-01T22:50:56.603

I have had a problem exactly like yours just now. Running a sample was returning an "Unknown Error" and printing "Hello Hello ", and cublasCreate was returning CUBLAS_STATUS_NOT_INITIALIZED. I found the answer here: http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#post-installation-actions

If a CUDA-capable device and the CUDA Driver are installed but deviceQuery reports that no CUDA-capable devices are present, this likely means that the /dev/nvidia* files are missing or have the wrong permissions.

Indeed, my /dev/nvidia* files were owned by root. I just ran

chown user.user /dev/nvidia*

(where user is my local user) and all the errors disappeared.

EDIT: later I had a similar issue on another machine, and this solution did not work, because one of the devices in /dev/nvidia was missing. What did work is just executing a sample code once as a sudo. After that one extra device appeared in /dev/nvidia*, and sample code started working without sudo.

Owning everything as root isn't required in my case I have:zangetsu@ares ~ $ ls -la /dev/nvidia* crw-rw-rw- 1 root root 239, 0 2. Jun 20.06 /dev/nvidia-uvm crw-rw-rw- 1 root root 239, 1 2. Jun 20.06 /dev/nvidia-uvm-tools crw-rw---- 1 root video 195, 0 2. Jun 19.35 /dev/nvidia0 crw-rw---- 1 root video 195, 255 2. Jun 19.35 /dev/nvidiactl — kensai, Jun 02 '17 at 16:17

Cuda hello_world.cu compiles but wrongly prints "Hello Hello"

1 Answers1