Profiling Cuda code that is called from Python

Question

I wrote a shared library in Cuda c that I wrapped with Cython and that is called from within a larger Python project. I would like to have some information on what is going on in the GPU while running the shared library, like achieved occupancy, memory throughput, etc.

What I have in mind is EITHER to start and stop profiling from within the Cuda or Python code OR to start some continuous gpu monitor (similar to top, for instance) before running the code.

It seems that I cannot use nvprof or nvvp for this.

You can start `nvprof` before running your CUDA or Python code, then run your CUDA or Python code to completion/termination, then get the profiling information from `nvprof` after that. It's covered in the documentation and described in the comments [here](https://stackoverflow.com/questions/50403436/profiling-arbitrary-cuda-applications). But no, it does not behave like an instantaneous monitor like `top`. — Robert Crovella, May 24 '18 at 12:39
You can easily profile python code with either nprof or nvvp. You literally just run the profiler with the python command you normally use. If the underlying CUDA code meets the necessary requirements for profiling, it will just work. — talonmies, May 25 '18 at 07:18
Thank you, this look like an easy solution. However, what talonmies suggests is not working at all for me: when doing `nvprof python3 myscript.py` I get `Error: unified memory profiling failed` and an internal error when disabling unified memory profiling. Robert's solution, running `nvprof --profile-all-processes` in one terminal and starting the Python code in another terminal only works partially: running `nvprof --profile-all-processes` or `nvprof --profile-all-processes --print-summary` gives no output at all, while `nvprof --profile-all-processes --metrics branch_efficiency` is working. — phlegmax, May 25 '18 at 12:09
see https://stackoverflow.com/questions/42124029/unable-to-import-nvprof-generated-profile-data — ethanjyx, Jan 27 '19 at 04:57

Profiling Cuda code that is called from Python

0 Answers0