As per this doc curated by Bright computing knowledge base the “Failed to initialize NVML: Driver/library version mismatch?” error generally means the CUDA Driver is still running an older release that is incompatible with the CUDA toolkit version currently in use.
Rebooting the VM is the easiest way to fix the issue. Rebooting the VM
will ensure that the drivers are properly initialized after the
upgrade.
If you do not wish to reboot the VM, you will need to remove the
existing Nvidia kernel module and load the new module.
On the VM:
Remove the existing Nvidia kernel module:
modprobe -r nvidia nvidia_uvm
Reload the systemd units:
systemctl daemon-reload
Build and load the new kernel module:
systemctl restart cuda-driver
If the old Nvidia Kernel module is still loading, you may need to
delete the module from the software image and node. You can check this
with:
find /lib/modules | grep nvidia
find /cm/images/default-image/lib/modules | grep nvidia
Refer to this official document to get rid of all previous CUDA and NVIDIA driver files, follow the steps in the cuda linux installation guide and then reinstall.