I am trying to run tensorflow with gpu support in a docker on a virtual machine. I have tried lots of online solutions including:
- tried different docker images of versions of tensorflow: 2.6, 2.4, 1.15, 1.14
- built tensorflow from source inside the container based on this guide several times with different bazel flags https://www.tensorflow.org/install/source for 2.6 and 1.14
- tried to make the GPU visible by these kinds of commands:TensorFlow : failed call to cuInit: CUDA_ERROR_NO_DEVICE
- used nvidia tensorflow docker
none of the solutions work for me, here some steps:
I verified that drivers and cuda and cudnn toolkit are installed inside the container using nvidia-smi and nvcc -V:
Python version is : Python 3.8.10
and tensorflow version is:
import tensorflow as tf
tf.__version__
'2.6.0'
The error appears with: tf.config.list_physical_devices()
So the GPU is somehow not visible to the tensorflow. All tensorflow builds return the same error:
E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error
but for example for 1.14 there is an additional comment regarding the CPU type:
Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
The GPU is a A100 and the CPU is Intel(R) Xeon(R) Gold 6226R.
What is going on here? How do I fix this?