8

I'm running Ubuntu 18.04 and have recently (about a month) installed CUDA 10.2 with runfile installation after A LOT of trouble with the slightly recommended .deb installation. Everything was fine: nvidia-smi showed GPU stats and I was able to run my parallel code on the GPU. Today I started my machine and the software center suggested some updates...it seemed just ordinary stuff a part from this libnvidia-compute-440 package, but i didn't pay too much attention and installed the all the updates. After that, my CUDA codes didn't work, I tried nvidia-smi and got a

Failed to initialize NVML: Driver/library version mismatch

I rebooted the systems, still got the errors. I spent the afternoon googling possible solutions, and I think I was able to find the core of problem: running dmesg |tail -4 gives

NVRM: API mismatch: the client has the version 440.59, but
NVRM: this kernel module has the version 440.33.01.  Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.

The version of the libnvidia-compute-440 package that I have is indeed 440.59 now, but my drivers version is 440.33.01 as dmesg |grep nvidia (or similar commands that I've tried) shows (see on third line)

 [   16.462737] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   16.463235] nvidia 0000:03:00.0: enabling device (0006 -> 0007)
[   16.785628] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 440.33.01  Tue Nov 12 23:43:11 UTC 2019
[   16.916202] [drm] [nvidia-drm] [GPU ID 0x00000300] Loading driver
[   16.916205] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:03:00.0 on minor 1

In fact, by looking at /var/log/apt/history.log I could see that libnvidia-compute-440 was updated from 440.33.01-0ubuntu1 to 440.59-0ubuntu0.18.04.133.01. All these evidence led me to the conclusion that I could try to go back to the previous version of that library, however apt-get install libnvidia-compute-440=440.33.01 (which I think it's the right syntax) gave me E: Version '440.33.01' for 'libnvidia-compute-440' was not found.

I really hope that a solution which doesn't imply uninstalling CUDA does exist, since installing it took me a weekend and as I said it was quite a PITA for me.

talonmies
  • 70,661
  • 34
  • 192
  • 269
user199710
  • 315
  • 1
  • 3
  • 8
  • Try removing the updated package with dpkg. I've had similar issues and had a good flight fixing my installation. The run file, if I'm not mistaking, only copies files but does not register any package. Due to bad experience with run file, I always stick to the Deb installation. It works great as long as you set the nouveau driver and reboot the PC before trying to install/uninstall CUDA. And since the libraries must match a specific driver version, never update anything from update manager. That's how I do it at least. – Damien LEFEVRE Jun 07 '20 at 19:38
  • Why the downvotes? Can someone explain? By the way thank you @Daniele LEFEVRE for your answer, I'll try as soon as possible – user199710 Jun 07 '20 at 20:46
  • 3
    "Help software updates broke my computer!" isn't really an on topic question for [SO]. You would be better served asking on askubuntu or the NVIDIA support forums. – talonmies Jun 08 '20 at 04:32

1 Answers1

8

UPDATE: RESOLVED

I was being very cautious, fearing that I would mess up my CUDA installation. However I took courage and updated my nvidia driver with

sudo apt install nvidia-driver-440

It successfully updated the drivers, which now match the libnvidia-compute version 440.59.

After rebooting everything works fine, just as before.

user199710
  • 315
  • 1
  • 3
  • 8
  • Thanks so much for your solution. I was facing same problem and I'm posting here my solution. In my case NVRM version was 440.100 and driver version was 460.32.03. My driver was updated by installing caffe-cuda and I didn't notice that time but I checked it from history.log. Following my NVRM version I just used `sudo apt install nvidia-driver-440` but it installed `450.102`, I don't know why it installed other version. Anyhow after rebooting my PC everything is working fine now. After reinstalling driver still my cuda is working fine. I have a question that how to stop nvidia driver update? – Erric Mar 30 '21 at 02:14
  • 1
    Because the issue was closed, I can't post the solution here. But for me the fix was different. See here: https://stackoverflow.com/a/71672261/10554033 – ladar Mar 30 '22 at 06:28