I am running a loop on a GPU such that after every iteration, I check if the convergence condition is satisified. If yes, I exit the while loop.
__device__ int converged = 0; // this line before the kernel
inside the kernel:
__global__ convergence_kernel()
{
if (convergence condition is true)
{
atomicAdd(&converged, 1);
}
}
On CPU I am calling the kernel within the loop:
int *convc = (int*) calloc(1,sizeof(int));
//converged = 0; //commenting as this is not correct as per Robert's suggestion
while(convc[0]< 1)
{
foo_bar1<<<num_blocks, threads>>>(err, count);
cudaDeviceSynchronize();
count += 1;
cudaMemcpyFromSymbol(convc, converged, sizeof(int));
}
So here, if the condition is true, my convc[0] = 1, however, when I print this value, I always see a random value, eg. conv = 3104 , conv = 17280, conv = 17408, etc.
Can someone tell me what's missing in my cudaMemcpyFromSymbol
operation? Am I missing something?? Thanks in advance.