518

When I run nvidia-smi, I get the following message:

Failed to initialize NVML: Driver/library version mismatch

An hour ago I received the same message and uninstalled my CUDA library and I was able to run nvidia-smi, getting the following result:

nvidia-smi-result

After this I downloaded cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb from the official NVIDIA page and then simply:

sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}

Now I have CUDA installed, but I get the mentioned mismatch error.


Some potentially useful information:

Running cat /proc/driver/nvidia/version I get:

NVRM version: NVIDIA UNIX x86_64 Kernel Module  378.13  Tue Feb  7 20:10:06 PST 2017
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)

I'm running Ubuntu 16.04.2 LTS (Xenial Xerus).

The kernel release is 4.4.0-66-generic.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
etal
  • 12,914
  • 4
  • 13
  • 16
  • 28
    You have probably mixed a previous runfile install with your (current) package manager install (apt-get). Follow the instructions in the [cuda linux install guide](http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#handle-uninstallation) to remove all previous NVIDIA driver and CUDA files, and then reinstall after you have cleaned that up. Before starting your reinstall, you may want to read the entire linux install guide doc I linked. The conflict almost certainly arises out of your attempt to install the CUDA 8 GA2 package on top of your existing 378.13 driver install. – Robert Crovella Mar 25 '17 at 23:00
  • 29
    @talonmies Where would be a good place to ask GPU related questions, if not on Stackoverflow? – bug_spray Nov 29 '20 at 10:18
  • 2
    I am using Ubuntu and I think error occurs after Nvidia driver is updated on Linux. Maybe auto-remove and reboot is required after updating Nvidia driver. – lechat Jan 16 '21 at 06:12
  • 7
    Running `sudo reboot` solved this problem for me. – mikey Jan 24 '22 at 14:49
  • 1
    sudo reboot worked for me – shivam Jun 29 '22 at 01:12
  • sudo reboot worked for me as well. Thanks to @mikey and shivam. – Fardo Aug 04 '22 at 09:59
  • 5
    Another overzealous set of 'close' votes on a question similar in nature to thousands of others that aren't closed, for a question directly relevant to the lives of thousands of programmers, that has nothing to do with 'opinion about frameworks' and everything to do with an actual developer issue, by people who probably don't spend much time actually working with either NVidia or CUDA. Again, the single greatest failing of SO is not to scale up the difficulty of closing a question at the same time as the number of close-voters scales up. – Dan Nissenbaum Nov 02 '22 at 05:38
  • One important thing many folks are missing here is to make sure you also clean up any junk that dkms might build automatically. Make sure you clear out /var/lib/dkms/nvidia of any old module sources! – David Cahill May 14 '23 at 07:35

19 Answers19

741

Surprise surprise, rebooting solved the issue (I thought I had already tried that).

The solution Robert Crovella mentioned in the comments may also be useful to someone else, since it's pretty similar to what I did to solve the issue the first time I had it.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
etal
  • 12,914
  • 4
  • 13
  • 16
  • 36
    I was sceptical about this working after a reboot, but nonetheless I gave it a try, and IT WORKED!! Thanks! – Abhishek Potnis Oct 25 '17 at 06:59
  • 42
    @AbhishekPotnis If you're wondering why reboot worked, it may be because of this: checking `/var/log/apt/history.log` on Ubuntu revealed that the system has automatically updated libcuda, which presumably required a restart to continue functioning correctly. I've since disabled those updates in the hopes that I won't see it again. – ProgrammingLlama Oct 26 '17 at 01:59
  • 3
    @john had same problem, reboot worked, and verified that in fact there was an automatic update, as recorded in the file you mention. thanks! would you mind sharing how to disable such updates? also, it might make sense to add this info to the current answer or a new answer. – Coffee_Table Jan 10 '18 at 17:26
  • had same problem, reboot worked! but I can't find ``/var/log/apt/history.log``. I'm using centOS, where is this file? – youkaichao Jul 31 '18 at 03:08
  • 3
    Unfortunately this is not a permanent solution. The problem might reappear. The solution is to install an newer version of the nvidia package (`nvidia-390`). See my answer below – Stefan Horning Aug 09 '18 at 09:40
  • Wanted to add this worked for me on NVIDIA-SMI 418.39, Driver Version: 418.39, CUDA Version: 10.1 (though we load cuda 10 libraries for TF). We will eventually need to upgrade this driver/fw combo to the latest once Tensorflow can support cuda 10.1 properly, but for now rebooting definitely still works. – TheDailyToast Apr 09 '19 at 18:23
  • 4
    This also worked for me. Some instructions include `sudo reboot now` and others don't. – rjurney Jul 05 '19 at 23:16
  • I rebooted and it upgraded cuda from 10.1 to 10.2 also with Ubuntu 18 my Intel graphics card is non functional. – gaurus Jan 27 '20 at 13:12
  • Cool! I had the same problem, I think related to nvidia driver auto update. – Knight of Ni Jun 26 '20 at 19:35
  • A reboot is an unnecessary drastic measure. You just need to unload the old nvidia kernel module (and it's dependencies): `sudo rmmod nvidia_drm nvidia_uvm nvidia_modeset nvidia`. See answer by [Comzyh](https://stackoverflow.com/a/45319156/7558731). – Sethos II Jul 10 '20 at 09:13
  • 1
    Don't do this! I can't log into GUI anymore after reboot! – 陈家胜 Jul 30 '20 at 09:16
  • 1
    Just restart, it works for me on Ubuntu 20.04.1 & NVIDIA-SMI 450.66 & CUDA Version: 11.0 & nvidia-driver-450 – Anh-Thi DINH Sep 25 '20 at 08:08
  • Can testify too, nvidia-440.100 driver and CUDA-10.1 in /usr/local/cuda – Ruthvik Vaila Jan 12 '21 at 16:53
  • this doesn't work on ubuntu 16.04 lts with nvidia-460 driver – noone Mar 15 '21 at 10:15
  • Restart can temporarily solve the problem, but it will appear again in my machine – Qinsheng Zhang May 21 '21 at 02:20
  • For a *permanent* solution, see my answer below. Basically you need to disable the autoupdate of the `nvidia` packages. – Long Jul 08 '21 at 05:07
  • I ran into this problem after `apt upgrade` and a reboot as described here fixed the issue. – StockB Dec 31 '21 at 15:15
  • Same for me. Rebooting solved the problem. – Tai Christian Jan 25 '22 at 08:20
  • It's August, 2022, and, yes, rebooting solved my problem. – truth Aug 04 '22 at 13:31
  • For Ubuntu 22.04, simple ```sudo update```, ```sudo full-upgrade```, and restart solve the issue. None of the other solutions work as intended. – kabraxis Nov 10 '22 at 22:06
  • https://www.dell.com/support/kbdoc/es-es/000133480/uefi-and-secure-boot-faqs#:~:text=Step%201%3A%20Tap%20F2%20or,Boot%20item%20to%20%22Disabled%22. – Ahmad AlMughrabi Nov 12 '22 at 15:57
  • I don't know why. But this worked for me. – Lucylalalala Mar 29 '23 at 09:06
  • In my case (archlinux), this issue appeared after updating `nvidia-open` and `nvidia-utils`. I had to run `nvidia-modprobe` and also enable nvidia resume service with `systemctl enable --now nvidia-resume.service`. After running these and rebooting, my installation went back to normal. – Ian Letourneau Aug 24 '23 at 21:28
  • Rebooting works but can't install `nvtop` – Toonia Aug 26 '23 at 15:29
461

As etal said, rebooting can solve this problem, but I think a procedure without rebooting will help.

For Chinese, check my blog -> 中文版

The error message

NVML: Driver/library version mismatch

tell us the Nvidia driver kernel module (kmod) have a wrong version, so we should unload this driver, and then load the correct version of kmod

How can we do that?

First, we should know which drivers are loaded.

lsmod | grep nvidia

You may get

nvidia_uvm            634880  8
nvidia_drm             53248  0
nvidia_modeset        790528  1 nvidia_drm
nvidia              12312576  86 nvidia_modeset,nvidia_uvm

Our final goal is to unload nvidia mod, so we should unload the module depend on nvidia:

sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia_uvm

Then, unload nvidia

sudo rmmod nvidia

Troubleshooting

If you get an error like rmmod: ERROR: Module nvidia is in use, which indicates that the kernel module is in use, you should kill the process that using the kmod:

sudo lsof /dev/nvidia*

and then kill those process, then continue to unload the kmods.

Test

Confirm you successfully unload those kmods

lsmod | grep nvidia

You should get nothing. Then confirm you can load the correct driver:

nvidia-smi

You should get the correct output.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Comzyh
  • 4,869
  • 1
  • 12
  • 10
  • 4
    @suraj it's not just linked. the answer is well written. the only issue is he didn't disclosed his affiliation and you did it. – Sagar V Jul 26 '17 at 06:38
  • 1
    This really works. The question is, if the wrong stuff is loaded at first, why does nvidia-smi load the correct ones after that? I mean, how does it know which are the correct ones? – KiralyCraft Jan 10 '18 at 12:26
  • 6
    @KiralyCraft The wrong one is no longer exists on disk, but still in memory. nvidia-smi just trigger a new loading procedure I think. – Comzyh Jan 12 '18 at 04:47
  • 3
    Brilliant! Had no idea this was what caused the problem. So rebooting does the same thing? – alys Jan 23 '18 at 13:48
  • 2
    @alys Obviously, rebooting will unload and then reload all the module. – Comzyh Feb 06 '18 at 06:29
  • 4
    worked but rebooting brings the problem back .. and my resolution is not right as well. It's not a clean installation at all.. – Kevin He Oct 19 '18 at 08:39
  • 16
    Not worked for me. `sudo rmmod nvidia_drm sudo rmmod nvidia_modeset sudo rmmod nvidia_uvm sudo rmmod nvidia` `lsmod | grep nvidia` gives empty output but there is still same error: `nvidia-smi Failed to initialize NVML: Driver/library version mismatch` After error `lsmod | grep nvidia` gives same output as at start. – mrgloom Jan 29 '20 at 10:54
  • 2
    It didn't work for me either, even after `sudo lsof /dev/nvidia*` I got `rmmod: ERROR: Module nvidia is in use` – Zabir Al Nazi Jul 27 '20 at 07:36
  • 2
    This solution works like a charm when there is no graphical interface used in parallel, il there is, then you need to reboot, otherwise graphical sessions will block and fail. – Gabriel Cretin Nov 26 '20 at 11:04
  • Worked for me on Fedora 33 after a reboot didn't solve the issue. – RobbieTheK Jan 10 '21 at 01:36
  • After `rmmod` I expect that you should `modprobe` to get those modules back. Otherwise the drivers are not loaded. But as i know there is no easy way to do so with `nvidia`. `rmmod` resulted in all `nvidia` drivers uninstalled and i had to reinstall them. I would expect a reboot to fix the problem and change apt repo to avoid automatic update - this seems like the most plausible solution to me. – Long Apr 15 '21 at 08:15
  • 1
    Great answer, worked fine for me. Thank you very much. However, do you know what could be the root cause of that error? – iwita Apr 16 '21 at 11:54
  • 1
    This worked for me and avoided a disruptive reboot. I shut down the X server with `sudo init 3` in order to get around the `rmmod: ERROR: Module nvidia is in use` problem which is a nice option if you're working remotely. – Keith Apr 26 '21 at 18:51
  • It happens to my servers occasionally, but I can't reboot my production machines. I used this instructions (rmmod), but I had to kill all processes using nvidia drivers (lsof and kill), then reload all drivers (modprobe), then restart gdm3 and gdm services (Ubuntu 20.04). It finally worked. Thanks! – qba-dev Jul 11 '21 at 17:27
  • For some reason which I haven't understood yet this didn't work for me. First, I stopped my display-manager from using nvidia, then I rmmod the modules and finally when I tried to modprobe back the modules I got `modprobe: FATAL: Module nvidia_drm not found in directory /lib/modules/` – Kirk Walla Jul 21 '21 at 13:36
  • Worked for me with my server device (Ubuntu 16.04). Thank you. – tolgayan Mar 24 '22 at 06:24
  • Unloading previous and loading new module after install worked. – kenorb Jun 25 '22 at 12:59
  • When I try to do `sudo rmmod nvidia_drm` I get `rmmod: ERROR: ../libkmod/libkmod-module.c:799 kmod_module_remove_module() could not remove 'nvidia_drm': Operation not permitted rmmod: ERROR: could not remove module nvidia_drm: Operation not permitted ` – sh37211 Jun 28 '22 at 21:55
  • Simply rebooting did not solve my issue, but this answer did. Thank you! – Shawn Jun 29 '22 at 15:35
  • Not working for me: `$ sudo rmmod nvidia_drm rmmod: ERROR: Module nvidia_drm is in use`. Check for which process: `sudo lsof /dev/nvidia*`: Empty output, i.e. nothing listed. `$ sudo rmmod nvidia_uvm rmmod: ERROR: ../libkmod/libkmod-module.c:799 kmod_module_remove_module() could not remove 'nvidia_uvm': Operation not permitted`. What *did* work was to go back to look at the last time new nvidia drivers were installed and uninstall them, e.g. `sudo apt remove libnvidia-compute-418 libnvidia-compute-430 libnvidia-compute-520`. Boom. Fixed instantly, without reboot. – sh37211 Nov 17 '22 at 23:30
  • Answer is not full, got `sudo rmmod nvidia` `rmmod: ERROR: Module nvidia is in use by: nv_peer_mem` – MosQuan Feb 10 '23 at 19:14
  • Well, this crashed my box, but hey, that means I rebooted so... – Dodger Apr 17 '23 at 19:27
42

Why does the version mismatch happen and how can we prevent it from happening again?

You may find that the versions of nvidia-* are different in these locations:

  1. dpkg -l | grep nvidia (look at nvidia-utils-xxx package version), and
  2. cat /proc/driver/nvidia/version (look at the version of Kernel Module, 460.56 - for example)

The restart should work, but you may want to forbid the automatic update of this package by modifying /etc/apt/sources.list.d/ files or simply hold the package by executing the command apt-mark hold nvidia-utils-version_number.

P.S.: Some content was inspired by this (the original instruction was in Chinese, so I referenced the translated version instead)

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Long
  • 1,482
  • 21
  • 33
  • 1
    Thank you for the explanation. I still don't get why it happens though. I start from a state where everything works, I don't update any package and still get the error after a seemingly random amount of time. Would you happen to know what exactly causes the mismatch to happen? – ClonedOne Feb 28 '22 at 16:06
  • @ClonedOne let’s assume that you didn’t update your packages. Have you used the commands (1 and 2) above to check if the versions are the same? If they are not the same then obviously one or both of them got updated somehow. And that you can try my solution (apt mark hold) or create new question with more detail if mine doesn’t help. – Long Mar 01 '22 at 05:29
  • 1
    @Antonio that's a correct way to do, please make sure to `mark` the version you want to use to avoid automatic update, and `unmark` it when needed. Thanks for a very informative reference. I learned a thing or two! – Long Aug 16 '22 at 13:17
  • 1
    @Long I just realized I forgot a "not" in my comment, so I am republishing it here: Thank you, for me what worked was running `dpkg -l | grep nvidia` and purge all the packages that were _not_ from the newer versions of the driver. Some useful links askubuntu.com/questions/18804/… askubuntu.com/questions/151941/…) – Antonio Aug 16 '22 at 13:50
  • @Antonio I see, in fact removing both older and newer packages should make it work with the right version, you can double check that the currently remaining package version would match the one in `/proc/driver/nvidia/version`. Please note that in the future your package may get auto updated and that's when `hold` command comes in handy. The [LINK](https://askubuntu.com/questions/151941) you copied above seems to be broken. For embedded link in comment, check [this](https://meta.stackoverflow.com/questions/291214/). Have a good day! – Long Aug 17 '22 at 02:53
  • 1
    @Long Yep, I killed the links while copy-pasting :/ https://askubuntu.com/questions/151941/how-can-you-completely-remove-a-package https://askubuntu.com/questions/18804/what-do-the-various-dpkg-flags-like-ii-rc-mean – Antonio Aug 17 '22 at 08:12
  • 1
    Thankyou so much. I was able to fix it. i accidentally installed a different version of nvidia-utils-xxx. It helped me fix my problem. – user1953366 Aug 31 '22 at 05:08
38

I was having this problem, and none of the other remedies worked. The error message was opaque, but checking the output of dmesg was the key:

[   10.118255] NVRM: API mismatch: the client has the version 410.79, but
           NVRM: this kernel module has the version 384.130.  Please
           NVRM: make sure that this kernel module and all NVIDIA driver
           NVRM: components have the same version.

However, I had completely removed the 384 version, and removed any remaining kernel drivers nvidia-384*. But even after reboot, I was still getting this. Seeing this meant that the kernel was still compiled to reference 384, but it was only finding 410. So I recompiled my kernel:

uname -a # Find the kernel it's using

Linux blah 4.13.0-43-generic #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux


update-initramfs -c -k 4.13.0-43-generic # Recompile it
reboot

And then it worked.

After removing 384, I still had 384 files in: /var/lib/dkms/nvidia-XXX/XXX.YY/4.13.0-43-generic/x86_64/module /lib/modules/4.13.0-43-generic/kernel/drivers

I recommend using the locate command (not installed by default) rather than searching the filesystem every time.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
user9170
  • 950
  • 9
  • 18
  • Thanks a lot! It's a good idea to use `locate nvidia-smi`. I used the command figuring out that another driver was installed. – hao Jun 04 '19 at 16:13
  • sudo update-initramfs -c -k `uname -r` Not helped me. – mrgloom Jan 29 '20 at 11:09
  • `dmesg` output: `NVRM: API mismatch: the client has the version 418.67, but NVRM: this kernel module has the version 430.26. Please NVRM: make sure that this kernel module and all NVIDIA driver NVRM: components have the same version.` – mrgloom Jan 29 '20 at 11:12
  • If `nvidia-smi` is showing *Failed to initialize NVML* despite successfully installing nvidia drivers and CUDA toolkit, issue could be that an older (and compressed) kernel with older Nvidia modules loaded at the reboot instead of an kernel with updated nvidia modules. https://stackoverflow.com/a/71672261/1243763 has more clarity to this issue and resolved issue for me (Cent OS 7, nvidia/460.32.03, 3.10.0-957.21.3.el7.x86_64 with CUDA 11.2) – Samir Aug 08 '22 at 03:39
  • Also worth looking into [this post by Andrew Laidlaw](https://andrewlaidlawpower.medium.com/troubleshooting-nvidia-gpu-driver-issues-624ecff9852b) on correctly building kernel-specific nvidia modules. – Samir Aug 08 '22 at 03:43
34

The top-2 answers can't solve my problem. I found a solution at the Nvidia official forum solved my problem.

The below error information may be caused by installing two different versions of the driver by different approaches. For example, install Nvidia driver by APT and the official installer.

Failed to initialize NVML: Driver/library version mismatch

To solve this problem, there is only a need to execute one of the following two commands.

sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
HZ-VUW
  • 842
  • 9
  • 20
18

I had the issue too (I'm running Ubuntu 18.04 (Bionic Beaver)).

What I did:

dpkg -l | grep -i nvidia

Then sudo apt-get remove --purge nvidia-381 (and every duplicate version, in my case I had 381, 384 and 387)

Then sudo ubuntu-drivers devices to list what's available.

And I choose sudo apt install nvidia-driver-430.

After that, nvidia-smi gave the correct output (no need to reboot). But I suppose you can reboot when in doubt.

I also followed this installation to reinstall cuda+cudnn.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Benjamin Crouzier
  • 40,265
  • 44
  • 171
  • 236
  • 2
    I don't know why this was marked down (-1). I incremented it to 0. The command "dpkg -l | grep -i nvidia" is valid and shows what is not deleted. – gerardg Nov 27 '19 at 16:18
  • I particularly liked selective purging and then listing of available drivers. – mirekphd Aug 02 '20 at 10:54
15

Reboot.

If the problem still exist:

sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia
nvidia-smi

For CentOS and Red Hat Enterprise Linux (RHEL):

cd /boot
mv initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut -vf initramfs-$(uname -r).img $(uname -r)

Then

reboot

For Debian/Ubuntu:

update-initramfs -u

If the problem persists:

apt install -y dkms && dkms install -m nvidia -v 440.82

Change 440.82 to your actual version.

Tip: get the Nvidia driver version:

ls /usr/src

You will find the Nvidia driver directory, such as nvidia-440.82.


Also, you can remove all Nvidia packages and reinstall the driver again:

apt purge nvidia*
apt purge *cuda*

# Check
apt list -i |grep nvidia
apt list -i |grep cuda
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
levinit
  • 394
  • 3
  • 7
6

These answers did not work for me:

dmesg

NVRM: API mismatch: the client has the version 418.67, but
NVRM: this kernel module has the version 430.26.  Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.

Uninstall old driver 418.67 and install new driver 430.26 (download NVIDIA-Linux-x86_64-430.26.run):

sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall
chmod +x NVIDIA-Linux-x86_64-430.26.run
sudo ./NVIDIA-Linux-x86_64-430.26.run
[ignore abort]

cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module  430.26  Tue Jun  4 17:40:52 CDT 2019
GCC version:  gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
mrgloom
  • 20,061
  • 36
  • 171
  • 301
5

This also happened to me on Ubuntu 16.04 using the nvidia-348 package (latest Nvidia version on Ubuntu 16.04).

However I could resolve the problem by installing nvidia-390 through the Proprietary GPU Drivers PPA.

So a solution to the described problem on Ubuntu 16.04 is doing this:

  • sudo add-apt-repository ppa:graphics-drivers/ppa
  • sudo apt-get update
  • sudo apt-get install nvidia-390

Note: This guide assumes a clean Ubuntu install. If you have previous drivers installed a reboot might be needed to reload all the kernel modules.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Stefan Horning
  • 1,117
  • 13
  • 17
5

Mostly reboot would fix the issue on Ubuntu 18.04 (Bionic Beaver).

The “Failed to initialize NVML: Driver/library version mismatch?” error generally means the CUDA Driver is still running an older release that is incompatible with the CUDA toolkit version currently in use. Rebooting the compute nodes will generally resolve this issue.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Sushena
  • 107
  • 3
  • 8
3

It doesn't work for me by rebooting or unloading the driver. I solved the problem by updating my Nvidia driver 440.33.01 to 450.80.2.

sudo apt-get install nvidia-driver-450

sudo reboot

I'm running Ubuntu 20.04 LTS (Focal Fossa), which is a remote server.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
ququuy
  • 31
  • 1
  • 2
1

I experienced this problem after a normal kernel update on a CentOS machine. Since all CUDA and Nvidia drivers and libraries have been installed via YUM repositories, I managed to solve the issues using the following steps:

sudo yum remove nvidia-driver-*
sudo reboot
sudo yum install nvidia-driver-cuda nvidia-modprobe
sudo modprobe nvidia # Or just reboot

It made sure my kernel and my Nvidia driver were consistent. I reckon that just rebooting may result in the wrong version of the kernel module being loaded.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
scrutari
  • 1,378
  • 2
  • 17
  • 33
0

I had reinstalled the Nvidia driver: run these commands in root mode:

  1. systemctl isolate multi-user.target

  2. modprobe -r nvidia-drm

  3. Reinstall the Nvidia driver: chmod +x NVIDIA-Linux-x86_64–410.57.run

  4. systemctl start graphical.target

And finally check nvidia-smi

Thanks to:

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
BarzanHayati
  • 637
  • 2
  • 9
  • 22
0

I committed the container into a Docker image. Then I recreated another container using this Docker image and the problem was gone.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Berat
  • 19
  • 3
0

I have to restart my kernels and remove all the packages that I have installed previously (during the first installation). Please make sure to delete all the packages, even after removing packages by the command below:

sudo apt-get --purge remove "*nvidia*"

The packages, like "libtinfo6:i386", don't get removed.

I'm using Ubuntu 20.04 (Focal Fossa) and Nvidia-driver-440. For that, you have to remove all the packages shown in the below image.

List of all the packages that need to be remove:

img

As shown in the image, make sure that the package you're installing is of the correct size. That is 207  MB for Nvidia-driver-440. If it's less, it means you haven't removed all the packages.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • 1
    Please review *[Should we edit a question to transcribe code from an image to text?](https://meta.stackoverflow.com/questions/415040)* and *[Why not upload images of code/errors when asking a question?](https://meta.stackoverflow.com/questions/285551/)* (e.g., *"Images should only be used to illustrate problems that* ***can't be made clear in any other way,*** *such as to provide screenshots of a user interface."*) and take the appropriate [action](https://stackoverflow.com/posts/62485069/edit) (it covers answers as well). Thanks in advance. – Peter Mortensen Apr 23 '22 at 00:31
0

For completeness, I ran into this issue as well. In my case it turned out that because I had set Clang as my default compiler (using update-alternatives), nvidia-driver-440 failed to compile (check /var/crash/) even though apt didn't post any warnings. For me, the solution was to apt purge nvidia-*, set cc back to use gcc, reboot, and reinstall nvidia-driver-440.

Tom
  • 2,674
  • 1
  • 25
  • 33
0

First I installed the Nvidia driver.

Next I installed CUDA.

After that, I got the "Driver/library version mismatch" error, but I could see the CUDA version, so I purged the Nvidia driver and reinstalled it.

Then it worked correctly.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
0

There is an easier solution that worked for me. On Fedora 33, try the following:

rpm -qa | grep -i nvidia | grep f32

You should have two packages listed from the previous version of Fedora for OpenGL. Remove those and reboot.

Deleting and reinstalling the entire Nvidia package set is overkill.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
-2

I was facing the same problem and I'm posting my solution here.

In my case, the NVRM version was 440.100 and the driver version was 460.32.03. My driver was updated by sudo apt install caffe-cuda and I didn't notice at that time, but I checked it from /var/log/apt/history.log.

By following my NVRM version, I just used sudo apt install nvidia-driver-440, but it installed 450.102. I don't know why it installed another version and nvidia-smi is showing 450.102.04.

Anyhow, after rebooting my PC, everything is working fine now. After reinstalling the driver, still my CUDA is working fine.

I didn't remove/purge anything related to the Nvidia driver. Version 460.32.03 was uninstalled automatically by running sudo apt install nvidia-driver-440.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Erric
  • 123
  • 1
  • 10
  • Because the issue was closed, I can't post the solution here. But for me the fix was different. See here: stackoverflow.com/a/71672261/10554033 – ladar Mar 30 '22 at 06:28