0

Hello in my kernel function i used 3 device function and i want to calculate the time taken by each device function.Is there any way to time the device functions time in kernel ? please kindly let me know Thank you

Chaps
  • 57
  • 8

1 Answers1

3

Quoting the CUDA C Programming Guide:

clock_t clock();
long long int clock64();

when executed in device code, returns the value of a per-multiprocessor counter that is incremented every clock cycle. Sampling this counter at the beginning and at the end of a kernel, taking the difference of the two samples, and recording the result per thread provides a measure for each thread of the number of clock cycles taken by the device to completely execute the thread, but not of the number of clock cycles the device actually spent executing thread instructions. The former number is greater that the latter since threads are time sliced.

This timing works pretty much Matlab's tic and toc. There is a clock sample in the CUDA SDK. Basically, it works like this

__global__ void max(..., int* time)
{
    int i = threadIdx.x + blockIdx.x * blockDim.x;

    clock_t start = clock(); 
    //device function call
    clock_t stop = clock();
    ...
    time[i] = (int)(stop - start);
}
Vitality
  • 20,705
  • 4
  • 108
  • 146