Final Definitive answer:
Hardware:
- Ryzen 9 5950X
- 64GB of DDR4 RAM
- RTX 3060 ti
I really wanted to work with Anaconda, as I'm very familiar with it and everything else I do works within Anaconda. On top of that, last year I got this to work within Anaconda no problem, so it had to be possible!
Problem:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import LSTM
from numpy.random import rand
X, y = rand(8000, 50, 5), rand(8000, 10)
model = keras.Sequential()
model.add(keras.Input(shape = (X.shape[1], X.shape[2])))
Up until here everything works fine.
The following line:
model.add(LSTM(units = 100))
Produces the following error:
NotImplementedError: Cannot convert a symbolic Tensor
(lstm_1/strided_slice:0) to a numpy array. This error may indicate that
you're trying to pass a Tensor to a NumPy call, which is not supported
Cause / Solution:
For a definitive answer I will have to refer you to the developers of Tensorflow
, but I was able to deduce the following:
This post has the exact same problem I did, it is solved by downgrading numpy
from 1.20.x
to 1.19.x
. The discussion on that post is an interesting read, basically Tensorflow
version >2.3.x
is compiled with numpy 1.19.5
. Anaconda installs version 1.20.x
by default when using conda install tensorflow-gpu
, they do not play nicely. Downgrading by itself is an easy enough fix.
If you have an NVIDIA RTX 30xx GPU however you are not done!
Long story short, RTX 30xx uses the Ampere architecture, this requires a newer version of CUDA, which requires a newer version of Tensorflow
, version >2.4.x
to be precise. As of the time of writing, this version is not available on conda
.
Therefore all of the convenience offered by conda
's automatic installation of cuDNN
and cudatoolkit
is no longer available. Simply pip install tensorflow=2.4.0
does not work. Worst part is, it might appear to be working until well over an hour into training something and it suddenly stops with a totally random error. (Sorry, I was ready to rage-quit at this point, it was late and didn't write down the errors, there were many, all of them led nowhere.)
This guide details in great detail how to compile cuDNN and CUDA from source. Before you follow this guide: If you go into control panel > programs and features and uninstall everything from NVIDIA that is not: NVIDIA graphics driver
, NVIDIA geforce experience
, NVIDIA HD audio driver
, NVIDIA PhysX
.
Another important note:
In step Building CUDA/cuDNN: Set 3 there is a critical typo. The guide instructs you to copy files
from:
# 1. cuDNN
\...\cudnn-11.0-windows-x64-v8.0.4.30.zip\cuda\bin
to:
# 2. NVIDIA GPU Computing Toolkit
\...\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include
This is incorrect!!
it should be from:
# 1. cuDNN
\...\cudnn-11.0-windows-x64-v8.0.4.30.zip\cuda\bin
to:
# 2. NVIDIA GPU Computing Toolkit
\...\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin
After following this guide I Restarted my pc (do not skip), made a new environment using python 3.8.11
:
conda create -n tf python=3.8
I installed tensorflow 2.4.0
using pip
directly from the command prompt and from within my new tf
environment:
pip install tensorflow==2.4.0
This also installs tensorflow
's gpu capabilities as opposed to the anaconda
version which installs cpu only when calling conda install tensorflow
. Ofcourse, it still doesn't work, you now have numpy 1.20.3
installed (you can check with conda list numpy
). Simply use conda install numpy=1.19
to downgrade it. And to top it off, on my system the example provided in the guide:
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
train_images, test_images = train_images / 255.0, test_images / 255.0
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
model.compile(optimizer='Adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))
history = model.fit(train_images, train_labels, batch_size=10, epochs=100)
will throw an error (at least it does for me):
NotFoundError: No algorithm worked!
[[node sequential/conv2d/Relu (defined at <ipython-input-1-bf665ec77ee4>:18) ]] [Op:__inference_train_function_580]
However, we are not interested in this example, we want to run LSTM / GRU, and not bug-fix this example. Therefore we will discard this and carry on, now we will try:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import LSTM
from numpy.random import rand
X, y = rand(8000, 50, 5), rand(8000, 10)
model = keras.Sequential()
model.add(keras.Input(shape = (X.shape[1], X.shape[2])))
model.add(LSTM(units = 100))
model.add(Dense(units = 10))
Low and behold, no error!
model.compile(loss = 'mse', optimizer = 'adam')
Still no error!
history = model.fit(X, y, epochs = 10)
Still no error!, is it even using the GPU? The messages in the console certainly seem to indicate so:
2021-08-19 13:04:09.234795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
Default GPU Device: /device:GPU:0
training model
2021-08-19 13:04:09.234795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.645028: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-19 13:04:10.647857: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-08-19 13:04:10.662783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.662799: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.667119: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.667133: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.669347: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.670066: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.675548: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.677202: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.677612: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.677658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.979738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-19 13:04:10.979763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-08-19 13:04:10.979770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-08-19 13:04:10.979886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 6617 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Ti, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-08-19 13:04:10.980387: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:10.980542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.980555: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.980563: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.980569: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.980575: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.980580: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.980586: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.980592: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.980646: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.980676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.980693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-19 13:04:10.980698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-08-19 13:04:10.980703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-08-19 13:04:10.980744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 6617 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Ti, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-08-19 13:04:10.980757: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:10.984016: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:10.984082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.984094: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.984100: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.984106: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.984112: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.984117: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.984122: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.984127: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.984132: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.984158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.984332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: NVIDIA GeForce RTX 3060 Ti computeCapability: 8.6
coreClock: 1.755GHz coreCount: 38 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-08-19 13:04:10.984344: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-08-19 13:04:10.984350: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-08-19 13:04:10.984355: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-08-19 13:04:10.984360: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-08-19 13:04:10.984365: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-08-19 13:04:10.984369: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-08-19 13:04:10.984374: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-08-19 13:04:10.984420: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-08-19 13:04:10.984445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-08-19 13:04:10.984470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-19 13:04:10.984475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-08-19 13:04:10.984479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-08-19 13:04:10.984533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6617 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Ti, pci bus id: 0000:0a:00.0, compute capability: 8.6)
2021-08-19 13:04:10.984546: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-08-19 13:04:11.334311: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
When looking at the task manager, I can see the memory being fully allocated and the 3D graph showing 99% utilization! The training time required has been quartered compared to using the CPU. All in all, great success!
I now really hope that running a Conv2D network of my own design won't result in the same error the example was having, but only time will tell, for now this is good enough for my purposes.