0

I use CUDA 5.0, and I want to compare matrix multiplication in C and cuBLAS. I already wrote a program in which matrix multiplication in C and cuBLAS both gave correct answers.

Now I want to compare their performance. For implementation in C, I used the clock(), but I found that cutil doesn't exist in CUDA 5.0, so I used cudaEvent. Both implementations use the same matrix, and in C, I just measured the time when C do the matrix multiplication, while in cuBLAS I began the measurement from createhandle till destroyhandle.

I got this result:
When C spends just 0.08ms, cuBLAS spend 59ms, and then I used clock() to measure time for cuBLAS, cuBLAS became faster than C. I don't know whether the method I used to measure time is correct. Why do cudaevent and clock() give different answers?

I use cuBLAS, cudaevent just following Nvidia's documentation. I'm really puzzled about how to measure time correctly.

Bart
  • 19,692
  • 7
  • 68
  • 77
  • 2
    Next time please spend some time to add proper punctuation, capitalization and formatting to your questions. That makes it a whole lot easier to read and follow. – Bart Oct 24 '12 at 14:29
  • How large are the matrices in question? – talonmies Oct 24 '12 at 15:07
  • sorry, I'm in a hurry. I'll take care of writing next time, Thank you for your advice and help – user1492775 Oct 25 '12 at 01:33
  • I multiply (4536,233) and (233, 233). – user1492775 Oct 25 '12 at 02:03
  • You have a units problem. If your CPU were really taking 0.08ms for the size you say it would be producing around 6000 GFLOP/s, which is incredibly unlikely. Your CPU is probably taking 0.08 *seconds* to do the multiply, which would be 6 GFLOP/s, whcen your GPU is giving about 8 GFLOP/s. – talonmies Oct 25 '12 at 05:19
  • When I use clock() to calculate time, it has a resolution of millisecond, I use the result to /CLOKS_PER_SEC, so the result is xxseconds, while in cudaevent the result is in xxms, and has a resolution of about half a microsecond, I mistake clock()'s resolution as its final result's unit. Thanks for you help. But can I use clock() to measure time for cublas? – user1492775 Oct 25 '12 at 08:29
  • 1
    you can try inserting cudaThreadsSynchronize() before clock() to see if it helps –  Oct 25 '12 at 11:10
  • Yes, most of the calls are. Only some of the routines which return a scalar value are blocking by default – talonmies Oct 25 '12 at 15:09

0 Answers0