6

Just a general question about cublas. For a single thread, if there is not memory transfer from GPU to CPU (e.g. cublasGetVector), will the cublas kernel functions (eg cublasDgemm) automatically be synchronized with the host?

    cublasDgemm();
//cublasGetVector();
    host_functions()

Furthermore, what about between two adjacent kernel calls?

cublasDgemm();
cublasDgemm();

and, what about a synchronized transfer that does not involve the global memory used in the previous kernel?

cublasDgemm(...gA...gB...gC);
cublasGetVector(...gD...D...);
talonmies
  • 70,661
  • 34
  • 192
  • 269
Hailiang Zhang
  • 17,604
  • 23
  • 71
  • 117

1 Answers1

9

No, the CUBLAS API is, with the exception of a few Level 1 routines which return a scalar value, asynchronous.

Level 3 routines like cublasDgemm don't block the host, you need to call a blocking API routine like a synchronous memory transfer or an explicit host-GPU synchronisation call to ensure that the CUBLAS call has completed.

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • Thanks! Now what about between two kernels, and what about a synchronous memory transfer does not involve the global memory used in the previous kernel? (the above post updated as well). – Hailiang Zhang Dec 02 '12 at 19:48
  • Kernels are always launched asynchronously and synchronous memory transfers are (except in one case) always synchronous. So you dgemm calls won't block and your memory transfers will block. – talonmies Dec 03 '12 at 07:09