24

Sometimes, bugs in my CUDA programs cause the desktop graphics to break (in Windows). Typically, the screen remains somewhat readable, but when graphics change, such as when dragging a window, lots of semi-random colored pixels and small blocks appear.

I have tried to reset the GPU and driver by changing the desktop resolution, but that doesn't help. The only fix I have found is to reboot the computer.

Is there a program out there or some trick I can use to get the driver and GPU to reset without rebooting?

Roger Dahl
  • 15,132
  • 8
  • 62
  • 82
  • You mean when you have a long-running cuda program and the driver crashes? – Tudor Jun 03 '12 at 15:13
  • @Tudor: No, I don't think the time it takes to run the kernel factors into it. It's not related to the watchdog timer. – Roger Dahl Jun 03 '12 at 15:18
  • 2
    This really should not happen, so you should test your board for hardware problems. First try swapping the board and running the same error-causing programs to see if you can reproduce it (ideally an instance of the same model board and a different board). If it reproduces, it is not likely a hardware problem. You could also try a memory checker like [this](https://simtk.org/home/memtest/) (not sure if it is up-to-date). – harrism Jun 03 '12 at 23:25

6 Answers6

33

Because the same problem occurs sometimes on unix and google forwarded me to this thread, I hope this helps somebody else..

On ubuntu unloading and reloading the nvidia kernel module solved the problem for me:

sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm
fraank
  • 670
  • 1
  • 7
  • 13
16

Edit:

If you are on Tesla hardware on Linux and can run nvidia-smi, then you can reset the GPU using

nvidia-smi -r

or

nvidia-smi --gpu-reset

Here is the man output for this switch:

Resets GPU state. Can be used to clear double bit ECC errors or recover hung GPU. Requires -i switch to target specific device. Available on Linux only.

Otherwise...


The way to truly reset the hardware is to reboot.

What you describe shouldn't happen. I recommend testing with different hardware and let us know if it still occurs.

harrism
  • 26,505
  • 2
  • 57
  • 88
  • Thank you for the answer. I added some background information. I wonder if the people that marked this as a favorite question have seen the issue. – Roger Dahl Jun 19 '12 at 05:03
  • What you added is really imprecise. You need a precise repro case in order to file a bug. – harrism Jun 19 '12 at 07:06
  • Why not just undo your bug fix that made the problem go away and then simplify the program to make a simple test. – harrism Jun 19 '12 at 07:07
  • Yes, I'm sorry about the imprecise description. I wrongly assumed that this issue was something CUDA programmers were familiar with and didn't consider that a repro might be wanted. I moved on with the app and the specific bug that triggered it this time is long lost. I will try to play around with things similar to what I did back then. – Roger Dahl Jun 19 '12 at 15:20
  • @RogerDahl if you are on Linux+Tesla, see my edit regarding nvidia-smi. Consider accepting/upvoting so we can get this marked answered? – harrism Sep 18 '12 at 01:28
  • Reboot it is. If you get a consistent repro, please file a bug with your registered dev account. – harrism Sep 18 '12 at 01:39
  • 1
    @harrism, what if I got the following, after issued a nvidia-smi --gpu-reset -i 5: One or more incomplete sets of NVLink GPUs were specified. GPU Reset couldn't run because the specified GPUs could not be validated for NVLink reset. I'm running a DGX-1 server. – João Paulo Navarro Mar 05 '20 at 12:36
4

To reset the graphics stack in Windows, press Win+Ctrl+Shift+B.

Matija Grcic
  • 12,963
  • 6
  • 62
  • 90
2

I have a GeForce GTX 260 over NVDIA GPU SDK 4.2 and I am experiencing the some problems. Sometimes developing I have bugs in the programs. This causes the screen to show the random colored pixels described in this post.

As stated here, if I change resolution they do not disappear. Moreover, if I only change the COLOUR DEPTH from 32 to 16 bits, the random colored pixels disappear, but going back to 32 bits (without rebooting) make them appear again. Last bug that caused this behaviour was using __constant__ memory but passing it as a pointer:

test<<<grid, threadsPerBlock>>>( cuda_malloc_data, cuda_constant_data );

If I do not pass cudb_constant_data, then there is no bug (and consequently, the random coloured pixels do not appear).

talonmies
  • 70,661
  • 34
  • 192
  • 269
jorge
  • 21
  • 2
0
  1. from "device manager", under Display adapters tab, find the driver
  2. disable it
  3. press win + ctrl +shift + B (monitor will blink)
  4. enable the driver

there you go.

0
  1. ps -ef
  2. find something like root 4066644 1 99 08:56 ? 04:32:25 /opt/conda/bin/python /data/
  3. kill 4066644
debrises
  • 95
  • 7