cuBLAS performance

Question

I use CUDA 5.0, and I want to compare matrix multiplication in C and cuBLAS. I already wrote a program in which matrix multiplication in C and cuBLAS both gave correct answers.

Now I want to compare their performance. For implementation in C, I used the clock(), but I found that cutil doesn't exist in CUDA 5.0, so I used cudaEvent. Both implementations use the same matrix, and in C, I just measured the time when C do the matrix multiplication, while in cuBLAS I began the measurement from createhandle till destroyhandle.

I got this result:
When C spends just 0.08ms, cuBLAS spend 59ms, and then I used clock() to measure time for cuBLAS, cuBLAS became faster than C. I don't know whether the method I used to measure time is correct. Why do cudaevent and clock() give different answers?

I use cuBLAS, cudaevent just following Nvidia's documentation. I'm really puzzled about how to measure time correctly.

Next time please spend some time to add proper punctuation, capitalization and formatting to your questions. That makes it a whole lot easier to read and follow. — Bart, Oct 24 '12 at 14:29
sorry, I'm in a hurry. I'll take care of writing next time, Thank you for your advice and help — user1492775, Oct 25 '12 at 01:33
You have a units problem. If your CPU were really taking 0.08ms for the size you say it would be producing around 6000 GFLOP/s, which is incredibly unlikely. Your CPU is probably taking 0.08 *seconds* to do the multiply, which would be 6 GFLOP/s, whcen your GPU is giving about 8 GFLOP/s. — talonmies, Oct 25 '12 at 05:19
When I use clock() to calculate time, it has a resolution of millisecond, I use the result to /CLOKS_PER_SEC, so the result is xxseconds, while in cudaevent the result is in xxms, and has a resolution of about half a microsecond, I mistake clock()'s resolution as its final result's unit. Thanks for you help. But can I use clock() to measure time for cublas? — user1492775, Oct 25 '12 at 08:29
you can try inserting cudaThreadsSynchronize() before clock() to see if it helps — , Oct 25 '12 at 11:10
Yes, most of the calls are. Only some of the routines which return a scalar value are blocking by default — talonmies, Oct 25 '12 at 15:09

cuBLAS performance

0 Answers0