According to CUDA streams not overlapping , "the profiler will serialize streaming to get accurate timing data". Now the question is, is there anyway to avoid this serialization behavior in cuda profiling (say nvvp)? I am using Fermin M2090 and cuda-4.0.
Asked
Active
Viewed 192 times
0
-
You could always check the Nvidia site for the latest version of CUDA and it's document, as well as the new features it provides. – kangshiyin Jan 23 '13 at 02:33
1 Answers
4
The Visual Profiler 5.0 (including nvprof and CUPTI) and Nsight Visual Studio Edition 2.0 and greater (>2 years old) support concurrent kernel trace for Fermi and Kepler devices.

Greg Smith
- 11,007
- 2
- 36
- 37

Eugene
- 9,242
- 2
- 30
- 29
-
Could you tell me which version support that? I didn't find it from the manual. – Hailiang Zhang Jan 23 '13 at 02:14
-
Simply download the latest toolkit. I believe 4.1/4.2 should support this as well but I'm not sure. – Eugene Jan 23 '13 at 17:09