3

As said here: How to reduce CUDA synchronize latency / delay

There are two approach for waiting result from device:

  • "Polling" - burn CPU in spin - to decrease latency when we wait result
  • "Blocking" - thread is sleeping until an interrupt occurs - to increase general performance

For "Polling" need to use CudaDeviceScheduleSpin.

But for "Blocking" what do I need to use CudaDeviceScheduleYield or cudaDeviceScheduleBlockingSync?

What difference between cudaDeviceScheduleBlockingSync and cudaDeviceScheduleYield?

cudaDeviceScheduleYield as written: http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDART__DEVICE_g18074e885b4d89f5a0fe1beab589e0c8.html "Instruct CUDA to yield its thread when waiting for results from the device. This can increase latency when waiting for the device, but can increase the performance of CPU threads performing work in parallel with the device." - i.e. wait result without burn CPU in spin - i.e. "Blocking". And cudaDeviceScheduleBlockingSync too - wait result without burn CPU in spin. But what difference?

Community
  • 1
  • 1
Alex
  • 12,578
  • 15
  • 99
  • 195
  • I don't understand what difference you are asking about - the difference between the two is exactly what you quoted. One uses an interrupt, the other uses a polling loop. – talonmies Mar 15 '14 at 13:39
  • @talonmies i.e. does `CudaDeviceScheduleYield` and `CudaDeviceScheduleSpin` both use a polling loop? Then what difference between them? `CudaDeviceScheduleYield` **increase latency** as written in docs, and therefore it can not be "Polling", isn't it? – Alex Mar 15 '14 at 13:48

1 Answers1

7

For my understanding, both approaches use polling to synchronize. In pseudo-code for CudaDeviceScheduleSpin:

while (!IsCudaJobDone())
{
}

whereas CudaDeviceScheduleYield:

while (!IsCudaJobDone())
{
     Thread.Yield();
}

i.e. CudaDeviceScheduleYield tells the operating system that it can interrupt the polling thread and activate another thread doing other work. This increases the performance for other threads on CPU but also increases latency, in case the CUDA job finishes when another thread than the polling one is active in that very moment.

kunzmi
  • 1,024
  • 1
  • 6
  • 8
  • 2
    Thanks! I.e. in ascending order from Low latency to High latency: `CudaDeviceScheduleSpin`, `CudaDeviceScheduleYield`, `cudaDeviceScheduleBlockingSync`? – Alex Mar 15 '14 at 15:08
  • 1
    Yes, and also an order from low to high "CPU usage for other tasks". But in case the polling thread is the current thread on the CPU in the moment the CUDA task finishes, cudaDeviceScheduleSpin and cudaDeviceScheduleYield should have the same latency. – kunzmi Mar 15 '14 at 15:41