I searched a bit but all things I found could only be annotated in CPU code, how could I measure partial time inside kernel between 2 _syncthread() of 1 threadblock? Is it possible?
Asked
Active
Viewed 214 times
1 Answers
1
One approach is to use the clock()
or clock64
function as described in the programming guide.
Search the cuda tag on clock64
for additional examples of its usage.

Robert Crovella
- 143,785
- 11
- 213
- 257