I had a libcudnn.so.7
error while trying to install MXNet on Google Colab. What ultimately solved it for me was installing libcudnn7 after some other steps; here's all the major stuff I performed. I hope this helps anyone else who is slogging through this kind of mess like I did.
My specific need was to downgrade Cuda in Google Colab; at the time of writing this it comes with 11.8 but MXNet only supports older versions. I was following this tutorial: https://aconcaguasci.blogspot.com/2019/12/setting-up-cuda-100-for-mxnet-on-google.html
I followed the majority of it including:
#Uninstall the current CUDA version
!apt-get --purge remove cuda nvidia* libnvidia-*
!dpkg -l | grep cuda- | awk '{print $2}' | xargs -n1 dpkg --purge
!apt-get remove cuda-*
!apt autoremove
!apt-get update
#Download CUDA 10.0
!wget --no-clobber https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
#install CUDA kit dpkg
# Note: I piped yes to answer the config file prompt with installing new version
!yes | dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
!sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
!apt-get update
!apt-get install cuda-10-0
# Although I did not encounter a `libcurand.so.10` error yet, I still ran this part too:
#Solve libcurand.so.10 error
!wget --no-clobber http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
#-nc, --no-clobber: skip downloads that would download to existing files.
!apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
!apt-get update
HERE was the point where I could pip install mxnet-cu100
and it would fail on import mxnet as mx
with "OSError: libcudnn.so.7: cannot open shared object file: No such file or directory".
!find / -iname libcudnn
would only return two folders, /var/lib/dpkg/alternatives/libcudnn
and /etc/alternatives/libcudnn
.
- The step that fixed it for me: I dug into the NVIDIA cuDNN Archives right here and found these instructions:
For example, for Cuda 9.0 and cuDNN 7.4.1:
$ sudo apt-get install libcudnn7=7.4.1.5-1+cuda9.0 sudo apt-get install libcudnn7-devel=7.4.1.5-1+cuda9.0
I swapped the Cuda version cuda9.0
for cuda10.0
and ran:
!sudo apt-get install libcudnn7=7.4.1.5-1+cuda10.0
I did not/could not run the libcudnn7-devel because it was "Unable to locate package"
After this, I could pip install mxnet-cu100==1.9.0
(MXNet for Cuda 10.0). And of course nvcc --version
would report Cuda 10.0. I was finally able to run import mxnet as mx
without getting any "cannot open shared object file: ..." errors.
I validated it successfully with:
import mxnet as mx
print(mx.context.num_gpus())
a = mx.nd.ones((2, 3), mx.gpu())
b = a * 2 + 1
print(b.asnumpy())
Outputting:
1
[[3. 3. 3.]
[3. 3. 3.]]
I realize this was with MXNet and not TensorFlow, but it was a libcudnn.so.7
error and I hope it helps anyone else coming across this, at least with Google Colab. I could not find much support for that recently, hence why I followed that tutorial I mentioned at the top.