how does one fix when torch can't find cuda, error: version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference?

Question

I get this error with a pytorch import python -c "import torch":

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/__init__.py", line 13, in <module>
    import torch
  File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/__init__.py", line 191, in <module>
    _load_global_deps()
  File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/__init__.py", line 153, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/ctypes/__init__.py", line 382, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

how does one fix it?

In case you need cpu-only version of pytorch follow [this](https://stackoverflow.com/a/75485213/7991462) answer — Constantine, Jul 13 '23 at 08:42

score 83 · Answer 1 · edited Feb 22 '23 at 02:09

83

Like eval said, it is because pytorch1.13 automatically install nvidia_cublas_cu11, nvidia_cuda_nvrtc_cu11, nvidia_cuda_runtime_cu11 and nvidia_cudnn_cu11. While I have my own CUDA toolKit already installed, I have the same problem.

In my case, I used pip uninstall nvidia_cublas_cu11 and solved the problem. I think the PyTorch team should solve this issue, since users often have their own CUDAtoolkit installed.

edited Feb 22 '23 at 02:09

Ynjxsjmh

28,441
6
34
52

answered Jan 12 '23 at 11:12

lenin

851
4
3

This is really the answer if you installed your own nvidia drivers/CUDA toolkit – Alex Feb 16 '23 at 19:07
Seconded, this is what worked for me. – Johnny Wales Feb 19 '23 at 17:48
whoooa +1000, def solved my issue on my vm that had cuda 11.3 pre-installed – habitats Apr 26 '23 at 12:50
thanks, it solved my issue with using pytorch in docker image gcr.io/deeplearning-platform-release/base-cu113 from https://cloud.google.com/deep-learning-containers/docs/choosing-container – Matěj Račinský Jun 14 '23 at 14:53

score 31 · Answer 2 · answered Dec 16 '22 at 18:32

The error is from dlopen libcublas.so from .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/, which is the pip package "nvidia-cuda-runtime" install location.

libcublasLt.so.11 is dynamically linked to libcublas.so.11. The problem is that when you have a different cuda runtime installation (usually in /usr/local/cuda), dlopen probably gets the wrong one. You can run ldd .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/libcublas.so to check the actual path of libcublasLt.so.11, which is supposed to be the one under .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/

Workarounds:

Set env LD_LIBRARY_PATH=.../python3.9/site-packages/torch/lib/nvidia/cublas/lib/:$LD_LIBRARY_PATH when launching python. So that dlopen can firstly look for .so files in that directory.
Using older torch. It was since 1.13.0 torch pip install started using pip nvidia-* packages. Before that cuda libs are statically linked. That's why older torch pip install has no problem even if you have existing cuda install.

This solution worked for me like a charm. To summarize, you will find libcublasLt.so.11 linked to `/usr/local/cuda/lib64/`; by using LD_LIBRARY_PATH=`/opt/conda/lib/python3.7/site-packages/nvidia/cublas/lib/:$LD_LIBRART_PATH` the issue if fixed. — ayush thakur, Jan 15 '23 at 08:34

score 12 · Answer 3 · answered Nov 10 '22 at 20:00

12

I don't know why this works but this worked for me:

source cuda11.1
# To see Cuda version in use
nvcc -V
pip3 install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html

but if you look through the git issue these might also work:

conda install -y -c pytorch -c conda-forge cudatoolkit=11.1 pytorch torchvision torchaudio

pip3 install torch+cu111 torchvision torchaudio -f https://download.pytorch.org/whl/torch_stable.html

I think the conda one looks like the most robust because you can specify exactly the cudatoolkit you need, so I'd recommend that one.

answered Nov 10 '22 at 20:00

Charlie Parker

5,884
57
198
323

2

Does anyone can explain why this work? And if there's a solution with pip requirements? – Mattia Surricchio Dec 11 '22 at 22:30
I have already nvcc==11.2 which I install to run tensorflow models (with no issues), what happens if I run these commands? Will me already existing cuda toolkit be messed up? Or is it safe to run? – Leevo Feb 14 '23 at 18:16

Isaac Nicholaus · Answer 4 · 2022-12-23T08:09:07.740

I wanted to work on an images detection problem using yolov7, and I installed default dependencies as provided by yolov7 https://github.com/WongKinYiu/yolov7/blob/main/requirements.txt, but when I tried even to check the help manual I got this error

OSError: .../yolov7_env/lib/python3.8/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

Then I tried to install some other dependencies using the following command: pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113 this's how I solved the problem.

score 0 · Answer 5 · answered Feb 22 '23 at 22:36

0

In my case, I was running on a cpu only compute, and this issue got solved when installing a cpu version of PyTorch. For example: http://download.pytorch.org/whl/cpu/torch-1.13.0%2Bcpu-cp39-cp39-linux_x86_64.whl

answered Feb 22 '23 at 22:36

leoschet

1,697
17
33

score 0 · Answer 6 · answered May 06 '23 at 17:39

0

In my case the error occurs in a conda env, which is because the conda env could not find cuda installation. I solved it actually by uninstalling the CUDA toolkit on the system so the conda env can use its own toolkit.

answered May 06 '23 at 17:39

Meilism Zhang

1

how does one fix when torch can't find cuda, error: version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference?

6 Answers6