How to make comparable timemeasurement in cuda and c++ code

Question

I have a cuda and a cpp implementation of the same algorithm. In CUDA I make the timemeasurement with events:

cudaEvent_t start, stop;
float time;
cudaEventCreate(&start);
cudaEventCreate(&stop);

cudaEventRecord(start, 0);      // start time measurement

//  some cuda stuff

cudaEventRecord(stop, 0);       // stop time measurement
cudaEventSynchronize(stop);     // sync results
cudaEventElapsedTime(&time, start, stop);
printf ("Elapsed time : %f ms\n", time);

In c++ I measure with timeofday:

struct timeval start, end;
long  seconds, useconds; 
float mseconds;
gettimeofday(&start, NULL);

// some work to do

gettimeofday(&end, NULL);

seconds  = end.tv_sec  - start.tv_sec;
useconds = end.tv_usec - start.tv_usec;
mseconds = (seconds * 1000 + useconds/1000.0) + 0.5;
printf ("Elapsed time : %f ms\n", mseconds);

Is this the correct way to get good- comparable results?

Thanks in advance!

If this is good enough for you, then it's fine for the sake of comparison (seeing your millisecond precision and no care for longish running times). If you want something in C++ standard way, C++11 and beyond, see [std::chrono](http://en.cppreference.com/w/cpp/chrono) - `steady_clock` for long durations (avoid system date adjustments in the process) or `high_resolution_clock` for as-good-a-precision-as-your-C++-standard-lib-and-OS-can-provide. — Adrian Colomitchi, Nov 14 '16 at 22:32
See also this thread: http://stackoverflow.com/questions/728068/how-to-calculate-a-time-difference-in-c — Rames, Nov 14 '16 at 22:34

score 0 · Accepted Answer · answered Nov 15 '16 at 09:24

Yes, this is a good way to get CPU-vs-GPU time comparisons.

There are multiple ways to get CPU timings, of course, ranging from high-resolution system timers to __rdtsc intrinsics. But for such a coarse comparison either should work just fine.

If you want to dig deeper into your GPU performance and look for potential areas of improvement, you may want to look at the command-line CUDA profiler nvprof, or at the Visual Profiler, which does the same thing but also has a GUI.

score 0 · Answer 2 · edited May 23 '17 at 10:30

If you simply want to compare the whole execution time of your CUDA-related stuff, you can keep your C++ time measurements. Just ensure your device has finished every single task it had before checking elapsed time :

gettimeofday(&start, NULL);

// some work to do
cudaDeviceSynchronize();

gettimeofday(&end, NULL);

This is a simple way to compute how much time your tasks took on device side with CUDA compared to CPU side.

As suggested by ApoorvaJ, if you need to go deeper into CUDA performance to check where are the device bottlenecks, you can use the Visual Profiler. If you are using Visual Studio, check these steps I wrote for another SO user who wanted to check the PTX code. You just have to explore the other data the Visual Profiler can provide, and there is a lot ! Check the Profiler section on the official CUDA documentation from Nvidia.

How to make comparable timemeasurement in cuda and c++ code

2 Answers2