1

I am running an iterative program in cuda, which runs till convergence. As said in this SO post (Are cuda kernel calls synchronous or asynchronous), from point of view of CPU, cuda kernels are asynchronous.

In my program, one of the kernel checks for convergence and returns the boolean value to the host to read. I wanted to know, whether I need to do

cudaDeviceSynchronize()

before reading the boolean value?

Community
  • 1
  • 1
user1118148
  • 193
  • 3
  • 11
  • Yuo need in synchronization after kernel execution and before reading your Boolean value cause control could be returned to host immediately after kernel would be executed. And thread that must write the value may not yet be running by the time. – Yappie Jan 30 '12 at 17:13

1 Answers1

5

It depends how are you returning the Boolean value back to the CPU. are you using cudaMemcpy? if yes then you don't have to use cudaDeviceSynchronize(), since cudaMemcpy will block until the kernel finishes execution and then copies data from GPU to CPU.

scatman
  • 14,109
  • 22
  • 70
  • 93