30

I am trying to install Tensorflow but it is asking for libcusolver.so.11 and I only have libcusolver.so.10. Can someone tell me what I am doing wrong

Here are my Ubuntu, nvidia and CUDA versions

$ uname -a
$ Linux *****-dev-01 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$nvidia-smi --query-gpu=gpu_name --format=csv|tail -n 1
GeForce GTX 1650

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:38_PDT_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.TC445_37.28540450_0

Here is how I am building tensorflow

$git clone https://github.com/tensorflow/tensorflow.git
$cd ./tensorflow
$git checkout tags/v2.2.0
$./configure
$bazel build --config=v2 --config=cuda --config=monolithic --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.1 --copt=-msse4.2 --copt=-Wno-sign-compare //        tensorflow:libtensorflow_cc.so

Here is the error I am receiving

ERROR: An error occurred during the fetch of repository 'local_config_cuda':
    Traceback (most recent call last):
     File "/home/********/Documents/foo/.temp_install_dir/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 1210
         _create_local_cuda_repository(<1 more arguments>)
     File "/home/********/Documents/foo/.temp_install_dir/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 934, in _create_local_cuda_repository
         _find_libs(repository_ctx, <2 more arguments>)
     File "/home/********/Documents/foo/.temp_install_dir/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 577, in _find_libs
         _check_cuda_libs(repository_ctx, <2 more arguments>)
     File "/home/********/Documents/foo/.temp_install_dir/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 479, in _check_cuda_libs
         execute(repository_ctx, <1 more arguments>)
     File "/home/********/Documents/foo/.temp_install_dir/tensorflow/tensorflow/third_party/remote_config/common.bzl", line 208, in execute
         fail(<1 more arguments>)
 Repository command failed
 No library found under: /usr/local/cuda/lib64/libcusolver.so.11
 ERROR: Skipping '//tensorflow:libtensorflow_cc.so': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
     File "/home/********/Documents/foo/.temp_install_dir/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 1210
         _create_local_cuda_repository(<1 more arguments>)
     File "/home/********/Documents/foo/.temp_install_dir/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 934, in _create_local_cuda_repository
         _find_libs(repository_ctx, <2 more arguments>)
     File "/home/********/Documents/foo/.temp_install_dir/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 577, in _find_libs
         _check_cuda_libs(repository_ctx, <2 more arguments>)
     File "/home/********/Documents/foo/.temp_install_dir/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 479, in _check_cuda_libs
         execute(repository_ctx, <1 more arguments>)
     File "/home/********/Documents/foo/.temp_install_dir/tensorflow/tensorflow/third_party/remote_config/common.bzl", line 208, in execute
         fail(<1 more arguments>)
 Repository command failed
 No library found under: /usr/local/cuda/lib64/libcusolver.so.11
 WARNING: Target pattern parsing failed.
 ERROR: no such package '@local_config_cuda//cuda': Traceback (most recent call last):
     File "/home/********/Documents/foo/.temp_install_dir/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 1210
         _create_local_cuda_repository(<1 more arguments>)
     File "/home/********/Documents/foo/.temp_install_dir/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 934, in _create_local_cuda_repository
         _find_libs(repository_ctx, <2 more arguments>)
     File "/home/********/Documents/foo/.temp_install_dir/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 577, in _find_libs
         _check_cuda_libs(repository_ctx, <2 more arguments>)
     File "/home/********/Documents/foo/.temp_install_dir/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 479, in _check_cuda_libs
         execute(repository_ctx, <1 more arguments>)
     File "/home/********/Documents/foo/.temp_install_dir/tensorflow/tensorflow/third_party/remote_config/common.bzl", line 208, in execute
         fail(<1 more arguments>)
 Repository command failed
 No library found under: /usr/local/cuda/lib64/libcusolver.so.11
 INFO: Elapsed time: 1.998s
 INFO: 0 processes.
 FAILED: Build did NOT complete successfully (0 packages loaded)
     currently loading: tensorflow
 NORMAL   test.log
puk
  • 16,318
  • 29
  • 119
  • 199
  • Which OS exactly? – CherryDT Jul 31 '20 at 21:17
  • @CherryDT 20.04 (details updated in question) – puk Jul 31 '20 at 21:23
  • 3
    There is no libcusolver.so.11, currently, from NVIDIA. The latest/currently available CUDA 11 linux install will actually install `libcusolver.so`, `libcusolver.so.10`, and `libcusolver.so.10.5.0.218` in `/usr/local/cuda/lib64`. This, in spite of the fact that e.g. the `libcudart` installed there is `libcudart.so.11` and the `libcublas` is `libcublas.so.11` (whereas `libcufft` is also `libcufft.so.10`). So this is rather unusual and may be tripping up your build process. I'm not really familiar with how bazel does this, but if it is attempting to link against `libcusolver.so.11` that is broken – Robert Crovella Jul 31 '20 at 23:36
  • 2
    see [here](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions) for documented confirmation. And when I say "broken" I mean if bazel is looking for libcusolver.so.11, then either bazel is broken, or something you fed to bazel by way of configuration broke it. As a workaround/alternative you might want to switch to CUDA 10.2 since there are certainly TF that have been built against CUDA 10.2. – Robert Crovella Jul 31 '20 at 23:39
  • 2
    another alternative would be to switch to the latest [ngc TF container](https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow) which has [TF utilizing CUDA 11](https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/rel_20-07.html#rel_20-07). Or perhaps you need to update to a newer TF branch and newer bazel to pick up some fixes for this. – Robert Crovella Jul 31 '20 at 23:46
  • 1
    @ CherryDT I am facing the same error. No such file of "Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory" whats the problem here? – MSI Oct 14 '21 at 18:56

3 Answers3

39

If you want a concrete solution, just find libcusolver.so.10 on your machine and create a link to libcusolver.so.11:

Following command solved issue for me:

sudo ln -s /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcusolver.so.10 /usr/local/cuda-11.0/targets/x86_64-linux/lib/libcusolver.so.11

Credit to: https://github.com/tensorflow/tensorflow/issues/43947

Aleksey Vlasenko
  • 990
  • 9
  • 10
  • 5
    If this still does not work, try additionally "export LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" – mdiener Jul 02 '21 at 09:29
  • This answer is good. It should be the official answer. – Geoffrey Anderson Sep 28 '21 at 19:37
  • 3
    Weirdest stuff ever, this did not help. I also needed to then symlink it into my virtual enviroment `# sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so.11 .venv/lib/python3.9/site-packages/tensorflow/python/libcusolver.so.11 ` – Ufos Dec 03 '21 at 11:42
  • Ufos, thank you, your extended method also worked for me, it took big amount of time before I solved this – Yan Varakin Dec 05 '21 at 14:42
  • Thank you Ufos. I had the same problem when using TF in a virtual environment on remote host (the trick explained by Aleksey answer only worked directly on the host machine). To make it work I also had to use the Ufo's suggestion. – Alessio Mora Feb 07 '22 at 09:29
22

Can someone tell me what I am doing wrong

Nothing.

As noted in comments there is no version 11.0 of cuSolver in the CUDA 11.0 release. There is plainly some logic built into bazel which is automagically deriving the names of the component libraries from the major version of the toolkit it detects. That logic is not correct for the CUDA toolkit you have. I would be raising this as a bug with the developers of bazel. You might be able to explicitly override that in some way, but I can't tell you how.

talonmies
  • 70,661
  • 34
  • 192
  • 269
4

If anyone comes across this issue, the problem for me was that I was using CUDA 11.0 and more recent TensorFlow versions require 11.2

tiho
  • 6,655
  • 3
  • 31
  • 31