cudaMemcpyFromSymbol errors when calling a CUDA kernel multiple times

Question

I am running a loop on a GPU such that after every iteration, I check if the convergence condition is satisified. If yes, I exit the while loop.

__device__ int converged = 0; // this line before the kernel

inside the kernel:

__global__ convergence_kernel()
{
   if (convergence condition is true)
   {
      atomicAdd(&converged, 1);
   }
}

On CPU I am calling the kernel within the loop:

int *convc = (int*) calloc(1,sizeof(int));
//converged = 0; //commenting as this is not correct as per Robert's suggestion
while(convc[0]< 1)
{
    foo_bar1<<<num_blocks, threads>>>(err, count);
    cudaDeviceSynchronize();
    count += 1;

    cudaMemcpyFromSymbol(convc, converged, sizeof(int));
}

So here, if the condition is true, my convc[0] = 1, however, when I print this value, I always see a random value, eg. conv = 3104 , conv = 17280, conv = 17408, etc.

Can someone tell me what's missing in my cudaMemcpyFromSymbol operation? Am I missing something?? Thanks in advance.

do you ever initialize the `converged` `__device__` variable? You certainly can't do it with `converged = 0;` in CPU code. — Robert Crovella, May 15 '14 at 19:35
@talonmies if `converged = 0` then the kernel foo_bar1 is called again. @robert-crovella how does one initialize the device variable? — cuda_hpc80, May 19 '14 at 18:50
Of course you have no statement that looks like this: `if (converged = 0) ...;` which would be broken anyway, so saying "if `converged = 0` then the kernel foo_bar1 is called again" makes no sense and is inconsistent with the code you have shown. The code you have shown is pretty much incoherent. To initialize a `__device__` variable, one method is to use the [`cudaMemcpyToSymbol` function](http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g2a229a704ade54887f7784e2e2dbd895). You can also statically initialize it: `__device__ int converged = 0;` — Robert Crovella, May 21 '14 at 20:43

score 1 · Answer 1 · answered May 21 '14 at 20:48

1

My best guess as to why you are getting garbage when you read the converged value into convc is that you have not initialized converged anywhere. It can't be done in host code like this:

converged = 0;

You could change your declaration to be like this:

__device__ int converged = 0; // this line before the kernel

or you could also use the cudaMemcpyToSymbol function, which is effectively the reverse of the cudaMemcpyFromSymbol function that you already seem to be aware of.

answered May 21 '14 at 20:48

Robert Crovella

143,785
11
213
257

I have initialized the device variable to 0 in the kernel. Still I see this error. My flags are: `NVCCFLAGS = -O3 -use_fast_math -m64 --ptxas-options=-v -lineinfo` and `GENCODE_SM20 := -gencode arch=compute_20,code=sm_20 GENCODE_SM30 := -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 GENCODE_FLAGS := $(GENCODE_SM30)` – cuda_hpc80 May 21 '14 at 21:00
1

then I suggest providing a short, complete, compilable code that reproduces the issue. You should also add [proper cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) and run your code with `cuda-memcheck`. – Robert Crovella May 21 '14 at 21:27

cudaMemcpyFromSymbol errors when calling a CUDA kernel multiple times

1 Answers1