I try to do a simple uni-directional communication between a CPU and a K80 GPU using CUDA. I want to have a bool
cancel flag that resides in global memory and is polled by all running GPU/kernel threads. The flag should default to false
and can be set by a CPU/host thread to true
during ongoing computation. The GPU/kernel threads then should exit.
This is what I tried. I have simplified code. I removed error checking and application logic (including the application logic that prevents concurrent access to cancelRequested
).
On the host side, global definition (.cpp):
// Host side thread safety of this pointer is covered by application logic
volatile bool* cancelRequested = nullptr;
On the host side in the compute thread (.cpp):
initialize(&cancelRequested);
compute(cancelRequested);
finalize(&cancelRequested);
On the host side in a main thread (.cpp):
cancel(cancelRequested); // Called after init is finished
Host routines (.cu file):
void initialize(volatile bool** pCancelRequested)
{
cudaMalloc(const_cast<bool**>(pCancelRequested), sizeof(bool));
const bool aFalse = false;
cudaMemcpy(*const_cast<bool**>(pCancelRequested), &aFalse, sizeof(bool), cudaMemcpyHostToDevice);
}
void compute(volatile bool* pCancelRequested)
{
....
computeKernel<<<pBlocksPerGPU, aThreadsPerBlock>>>(pCancelRequested);
cudaDeviceSynchronize(); // Non-busy wait
....
}
void finalize(volatile bool** pCancelRequested)
{
cudaFree(*const_cast<bool**>(pCancelRequested));
*pCancelRequested = nullptr;
}
void cancel(volatile bool* pCancelRequested)
{
const bool aTrue = true;
cudaMemcpy(const_cast<bool*>(pCancelRequested), &aTrue, sizeof(bool), cudaMemcpyHostToDevice);
}
Device routines (.cu file):
__global__ void computeKernel(volatile bool* pCancelRequested)
{
while (someCondition)
{
// Computation step here
if (*pCancelRequested)
{
printf("-> Cancel requested!\n");
return;
}
}
}
The code runs fine. But it does never enter the cancel case. I read back the false
and true
values in initialize()
and cancel()
successfully and checked them using gdb. I.e. writing to the global flag works fine, at least from host side view point. However the kernels never see the cancel flag set to true
and exit normally from the outer while
loop.
Any idea why this doesn't work?