-2

I am running a loop on a GPU such that after every iteration, I check if the convergence condition is satisified. If yes, I exit the while loop.

__device__ int converged = 0; // this line before the kernel

inside the kernel:

__global__ convergence_kernel()
{
   if (convergence condition is true)
   {
      atomicAdd(&converged, 1);
   }
}

On CPU I am calling the kernel within the loop:

int *convc = (int*) calloc(1,sizeof(int));
//converged = 0; //commenting as this is not correct as per Robert's suggestion
while(convc[0]< 1)
{
    foo_bar1<<<num_blocks, threads>>>(err, count);
    cudaDeviceSynchronize();
    count += 1;

    cudaMemcpyFromSymbol(convc, converged, sizeof(int));
}

So here, if the condition is true, my convc[0] = 1, however, when I print this value, I always see a random value, eg. conv = 3104 , conv = 17280, conv = 17408, etc.

Can someone tell me what's missing in my cudaMemcpyFromSymbol operation? Am I missing something?? Thanks in advance.

cuda_hpc80
  • 557
  • 2
  • 7
  • 15
  • cuda-memcheck passes with no errors ! – cuda_hpc80 May 15 '14 at 17:54
  • 1
    do you ever initialize the `converged` `__device__` variable? You certainly can't do it with `converged = 0;` in CPU code. – Robert Crovella May 15 '14 at 19:35
  • What do you think `converged = 0` does in your host code? – talonmies May 17 '14 at 05:19
  • @talonmies if `converged = 0` then the kernel foo_bar1 is called again. @robert-crovella how does one initialize the device variable? – cuda_hpc80 May 19 '14 at 18:50
  • Of course you have no statement that looks like this: `if (converged = 0) ...;` which would be broken anyway, so saying "if `converged = 0` then the kernel foo_bar1 is called again" makes no sense and is inconsistent with the code you have shown. The code you have shown is pretty much incoherent. To initialize a `__device__` variable, one method is to use the [`cudaMemcpyToSymbol` function](http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g2a229a704ade54887f7784e2e2dbd895). You can also statically initialize it: `__device__ int converged = 0;` – Robert Crovella May 21 '14 at 20:43

1 Answers1

1

My best guess as to why you are getting garbage when you read the converged value into convc is that you have not initialized converged anywhere. It can't be done in host code like this:

converged = 0;

You could change your declaration to be like this:

__device__ int converged = 0; // this line before the kernel

or you could also use the cudaMemcpyToSymbol function, which is effectively the reverse of the cudaMemcpyFromSymbol function that you already seem to be aware of.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • I have initialized the device variable to 0 in the kernel. Still I see this error. My flags are: `NVCCFLAGS = -O3 -use_fast_math -m64 --ptxas-options=-v -lineinfo` and `GENCODE_SM20 := -gencode arch=compute_20,code=sm_20 GENCODE_SM30 := -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 GENCODE_FLAGS := $(GENCODE_SM30)` – cuda_hpc80 May 21 '14 at 21:00
  • 1
    then I suggest providing a short, complete, compilable code that reproduces the issue. You should also add [proper cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) and run your code with `cuda-memcheck`. – Robert Crovella May 21 '14 at 21:27