I am trying to run tensorflow-gpu version 2.4.0-dev20200828 (a tf-nightly build) for a convolutional neural network implementation. Some other details:
- The version of python is Python 3.8.5.
- Running Windows 10
- Using an nVidia RTX 2080 which has 8 GB VRAM
- Cuda Version 11.1
The following snippet is what I run:
import tensorflow as tf
from tensorflow import keras
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Virtual devices must be set before GPUs have been initialized
print(e)
vgg_16 = keras.applications.VGG16(include_top=False, input_shape=(600, 600, 3))
random_image = np.random.rand(1, 600, 600, 3)
output = vgg_16(random_image)
The code for the memory configuration was taken from answers from here
The issue I am having is that my GPU has 8GB of VRAM, and I need to be able to run the CNN with relatively large image batch sizes. The example is executed on a single image, but surprisingly I seem to only be able to increase the batch size to about 2-3 600 by 600 images. The code taken as per the comments says that it:
Restrict TensorFlow to only allocate 1GB of memory on the first GPU, which is clearly not ideal.
On the one hand if I allocate more, say 4000MB, I get errors such as:
E tensorflow/stream_executor/cuda/cuda_dnn.cc:325] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
If I leave it as 1024 MB, I get messages like:
Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.25GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Any insights/resources on how to understand this issue much appreciated. I'd be willing to switch to another version of tensorflow/python/cuda if necessary, but ultimately I just want to have a deeper understanding of what this issue is.