I am using a remote machine, which has 2 GPU's, in order to execute a Python script which has CUDA code. In order to find where I can improve the performance of my code, I am trying to use nvprof
.
I have set on my code that I only want to use one of the 2 GPU's on the remote machine, although, when calling nvprof --profile-child-processes ./myscript.py
, a process with the same ID is started on each of the GPU's.
Is there any argument I can give nvprof
in order to only use one GPU for the profiling?