4

I have some pyCUDA code that keeps the GPU at 100% usage and seems to hog the GPU to the point that my screen only updates every second or so.
Changing the block and grid sizes doesn't help.
Each thread in the grid goes through a loop about 1.3 million times, and there are only around 6 blocks of 16 threads. If I make it a small loop there is no problem, but unfortunately it has to be that big and I see no good way to distribute the work into more blocks.
Is there a way to limit the GPU usage of my program, or maybe change the priority of the screen?
GTX 1060 on Windows.

Frobot
  • 1,224
  • 3
  • 16
  • 33
  • "there are only around 6 blocks of 16 threads" That is very inefficient use of the GPU. You don't mention whether this is windows or linux. There is no way currently to force a long-running kernel on a windows WDDM GPU (or a linux display GPU) to be pre-empted to allow display tasks to proceed. You'll need to push on one of those things you say you can't do. Alternatively, get another GPU, one that can be placed in TCC mode on windows (or any GPU should be OK on linux) and run your display on the GPU that isn't processing CUDA tasks. – Robert Crovella Mar 11 '18 at 00:00
  • I'm pretty sure that as long as the GPU usage is at 100%, splitting the work up more won't make things faster. slower if anything. I'm using windows 10 with a GTX 1060 – Frobot Mar 11 '18 at 00:04
  • 2
    You can get 100% usage with a kernel [using a single thread](https://stackoverflow.com/questions/40937894/nvidia-smi-volatile-gpu-utilization-explanation/40938696#40938696). That indicates nothing about how efficiently the GPU is being used by the kernel that is running. An inefficient code turned into an efficient code can certainly do the same work in less time. And 96 total threads will not efficiently saturate any GPU. That is about 2 orders of magnitude below the threshold for giving the GPU enough parallel work to hide latency. – Robert Crovella Mar 11 '18 at 01:57
  • It's about 300 times faster than the exact same code on CPU and I also don't have a way to break it up. So I guess I can deal with that – Frobot Mar 12 '18 at 17:10
  • 1
    You could always get a second GPU card... or use the integrated GPU that all modern CPU's have for display. – JHBonarius Mar 15 '18 at 13:12

1 Answers1

5

Is there a way to limit the GPU usage of my program, or maybe change the priority of the screen?

In a word, no.

The GPU cannot simultaneously run compute jobs and refresh the display. There is no concept of priority. If you have long running compute code, it will block the display from refreshing and the duration of that block is determined by the compute code. The driver only has one preemption mechanism, and that is the watch dog timer which will kill a long running compute activity on a display device.

If you need screen responsiveness during compute operations, either vastly decrease the run time of an individual kernel launch, or get a second GPU and have one dedicated to the compute work and one for display.

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • 1
    Re "vastly decrease the run time of an individual kernel launch": Try to scale work per kernel such that kernels execute in 200 ms or less on the slowest GPUs used. That way the GUI should feel responsive. – njuffa Mar 11 '18 at 16:50