I use singularity and I need to install a nvidia driver in my singularity container to do some deep learning with a gtx 1080. This singularity image is created from a nvidia docker from here: https://ngc.nvidia.com/catalog/containers/nvidia:kaldi and converted to a singularity container. There was no nvidia drivers I think because nvidia-smi was not found before I install the driver.
I did the following commmands :
add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
apt install nvidia-418
after that I wanted to see if the driver was well installed, I did the command :
nvidia-smi
which return : Failed to initialize NVML: Driver/library version mismatch
I searched about how to solve this error and found this topic : NVIDIA NVML Driver/library version mismatch
One answer says to do the command :
lsmod | grep nvidia
and then to rmmod on each except nvidia and finally to rmmod nvidia.
rmmod drm
But when I do this, as the topic excepted it, I have the error : rmmod: ERROR: Module nvidia is in use.
The topic says to tap lsof /dev/nvidia*, and to kill the process that use the module, but I see nothing with drm written, and it seems to be a very bad idea to kill the process (Xorg, gnome-she).
Here is the answer to the command lsof /dev/nvidia*, followed by the command lsmod | grep nvidia, and then rmmod drm
Rebooting the computer also didn't work.
what should I do to manage using nvidia-smi and be able to use my GPU from inside the singularity container ?
Thank you