1

I have a server with Ubuntu 16.04 installed. It has a K80 GPU. Multiple processes are using the GPU.

Some processes have unpredictable GPU usage, and I want to reliably monitor their GPU usage.

I know that you can query GPU usage via: nvidia-smi, but that only gives you the usage at the queried time.

Currently I query the information every 100 ms, but that's just sampling the GPU usage, and can potentially skip peak GPU usage.

Is there a reliable way for me to get the maximum GPU memory usage for a given PID process?

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • You probably want to intercept calls to [`cuMemAlloc()`](http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gb82d2a09844a58dd9e744dc31e8aa467), [`cuMemFree ()`](http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g89b3f154e17cc89b6eea277dbdf5c93a) and related functions via a library inserted with the [LD_PRELOAD trick](http://stackoverflow.com/questions/426230/what-is-the-ld-preload-trick). Then you can record the peak/sustained/any-metric-you-want memory use in any way you want. – tera Mar 14 '17 at 19:27

1 Answers1

1

Try using the NVIDIA Visual Profiler. I am not sure how accurate it is but it gives you a graph of the device memory usage at different times when your program is running.

KuroNeko
  • 319
  • 2
  • 8
  • 17