code 1:
with torch.profiler.profile(activities=[torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA]) as prof
code 2: (detail https://xfxuezhang.blog.csdn.net/article/details/132600645):
torch.cuda.current_stream(self._device).synchronize()
time.time()
result:
code 1:
Self CPU time total: 71.297s
Self CUDA time total: 16.971s
code 2: 52.260056
The time difference between these two is quite large. Claude tells me:
- The profiler measures the total running time of the program, including initialization and other overhead. time usually only measures the core computation.
- The profiler introduces additional overhead like recording events, collecting performance data, etc. This adds to the running time.
- The profiler may affect program execution, e.g. causing more memory allocations or function calls, increasing running time.
- The profiler time may include synchronization time like communication between processes. time only measures the process's own running time.
- The profiler accounts for all CUDA operations like memory copies, kernel launches, etc. time only measures CPU execution time.
- The profiler has higher precision, accurate to microseconds. time only has millisecond resolution.
- If using cProfile, it counts function calls not time, so the two metrics aren't comparable.
- The profiler may measure more threads like I/O threads, while time only measures the main thread. In summary, the profiler provides more comprehensive timing which is often longer than simple time measurements. The profiler overhead needs to be analyzed for the specifics of the program. torch.cuda.synchronize() can be used to reduce discrepancies for CUDA operations.
But I don't know whether it answers correctly.
Besides, I use with torch.profiler.record_function("train")
after with torch.profiler.profile(activities=[torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA]) as prof
, and I find that if i comment record_function
, the final CPU time will decrease (~20s). I don't know why its overhead is so high.