31

I get this error with a pytorch import python -c "import torch":

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/__init__.py", line 13, in <module>
    import torch
  File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/__init__.py", line 191, in <module>
    _load_global_deps()
  File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/__init__.py", line 153, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/ctypes/__init__.py", line 382, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

how does one fix it?

related:

Charlie Parker
  • 5,884
  • 57
  • 198
  • 323
  • In case you need cpu-only version of pytorch follow [this](https://stackoverflow.com/a/75485213/7991462) answer – Constantine Jul 13 '23 at 08:42

6 Answers6

83

Like eval said, it is because pytorch1.13 automatically install nvidia_cublas_cu11, nvidia_cuda_nvrtc_cu11, nvidia_cuda_runtime_cu11 and nvidia_cudnn_cu11. While I have my own CUDA toolKit already installed, I have the same problem.

In my case, I used pip uninstall nvidia_cublas_cu11 and solved the problem. I think the PyTorch team should solve this issue, since users often have their own CUDAtoolkit installed.

Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
lenin
  • 851
  • 4
  • 3
31

The error is from dlopen libcublas.so from .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/, which is the pip package "nvidia-cuda-runtime" install location.

libcublasLt.so.11 is dynamically linked to libcublas.so.11. The problem is that when you have a different cuda runtime installation (usually in /usr/local/cuda), dlopen probably gets the wrong one. You can run ldd .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/libcublas.so to check the actual path of libcublasLt.so.11, which is supposed to be the one under .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/

Workarounds:

  1. Set env LD_LIBRARY_PATH=.../python3.9/site-packages/torch/lib/nvidia/cublas/lib/:$LD_LIBRARY_PATH when launching python. So that dlopen can firstly look for .so files in that directory.

  2. Using older torch. It was since 1.13.0 torch pip install started using pip nvidia-* packages. Before that cuda libs are statically linked. That's why older torch pip install has no problem even if you have existing cuda install.

eval
  • 1,169
  • 12
  • 19
  • 9
    This solution worked for me like a charm. To summarize, you will find libcublasLt.so.11 linked to `/usr/local/cuda/lib64/`; by using LD_LIBRARY_PATH=`/opt/conda/lib/python3.7/site-packages/nvidia/cublas/lib/:$LD_LIBRART_PATH` the issue if fixed. – ayush thakur Jan 15 '23 at 08:34
12

I don't know why this works but this worked for me:

source cuda11.1
# To see Cuda version in use
nvcc -V
pip3 install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html

but if you look through the git issue these might also work:

conda install -y -c pytorch -c conda-forge cudatoolkit=11.1 pytorch torchvision torchaudio

pip3 install torch+cu111 torchvision torchaudio -f https://download.pytorch.org/whl/torch_stable.html

I think the conda one looks like the most robust because you can specify exactly the cudatoolkit you need, so I'd recommend that one.

Charlie Parker
  • 5,884
  • 57
  • 198
  • 323
  • 2
    Does anyone can explain why this work? And if there's a solution with pip requirements? – Mattia Surricchio Dec 11 '22 at 22:30
  • I have already nvcc==11.2 which I install to run tensorflow models (with no issues), what happens if I run these commands? Will me already existing cuda toolkit be messed up? Or is it safe to run? – Leevo Feb 14 '23 at 18:16
3

I wanted to work on an images detection problem using yolov7, and I installed default dependencies as provided by yolov7 https://github.com/WongKinYiu/yolov7/blob/main/requirements.txt, but when I tried even to check the help manual I got this error

OSError: .../yolov7_env/lib/python3.8/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

Then I tried to install some other dependencies using the following command: pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113 this's how I solved the problem.

0

In my case, I was running on a cpu only compute, and this issue got solved when installing a cpu version of PyTorch. For example: http://download.pytorch.org/whl/cpu/torch-1.13.0%2Bcpu-cp39-cp39-linux_x86_64.whl

leoschet
  • 1,697
  • 17
  • 33
0

In my case the error occurs in a conda env, which is because the conda env could not find cuda installation. I solved it actually by uninstalling the CUDA toolkit on the system so the conda env can use its own toolkit.