I am running an iterative program in cuda, which runs till convergence. As said in this SO post (Are cuda kernel calls synchronous or asynchronous), from point of view of CPU, cuda kernels are asynchronous.
In my program, one of the kernel checks for convergence and returns the boolean value to the host to read. I wanted to know, whether I need to do
cudaDeviceSynchronize()
before reading the boolean value?