How to get Keras Conv2D layers to work on GPU

Question

I am trying to train a simple convolutional network using Keras (Tensorflow 2.8.0) in Python 3.7.9 on Spyder IDE 5.2.2. The network involves Conv2D, MaxPooling2D, Flatten and Dense layers.

The model ran perfectly when I used my CPU, but training was slow. So I decided to try to run it on my GPU (GeForce GTX 1050 Ti).

I installed CUDA 11.2 and added lib, include and bin to my path. I installed CuDNN 8.1, and copied cudnn84_8.dll into the CUDA bin directory, also copied cudnn.h into CUDA header and cudnn.lib into CUDA lib.

Once I had done the above, I was able to use my GPU with Tensorflow. Tensorflow does recognize my GPU: when I run tf.config.list_physical_devices() I get:

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

However, when I run the model containing Conv2D layers using my GPU, it fails as follows. This is the output I get:

2022-03-17 18:32:04.687276: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2022-03-17 18:32:05.171943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2782 MB memory: -> device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1

Epoch 1/10

This is where you would normally see the progress bars for the training (verbose=1) but nothing further appears after "Epoch 1/10". Instead, I think the kernel restarts, because it clears all saved variables and I have to re-import all packages.

If I train a model using only Dense layers, it does work using the GPU. I have checked that it is actually using the GPU in this case, by monitoring GPU usage - it does use it. And as mentioned, the Conv2D model works fine on my CPU.

So in summary, I seem to have a problem selective for Conv2D models on my GPU. Any help in understanding this would be greatly appreciated!

My code:

from matplotlib import pyplot
import tensorflow as tf
import numpy as np
import keras
from keras.datasets import cifar10
#from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
from tensorflow.keras.optimizers import SGD

def load_dataset():
    (trainX, trainy), (testX, testy) = cifar10.load_data()
    print('Train shape: X=%s, y=%s' % (trainX.shape, trainy.shape))
    print('Test shape: X=%s, y=%s' % (testX.shape, testy.shape))
    trainY = tf.keras.utils.to_categorical(trainy)
    testY = tf.keras.utils.to_categorical(testy)
    return trainX, trainY, testX, testY

def prep_pixels(train, test):
    # convert from integers to floats
    train_norm = train.astype('float32')
    test_norm = test.astype('float32')
    # normalize to range 0-1
    train_norm = train_norm / 255.0
    test_norm = test_norm / 255.0
    # return normalized images
    return train_norm, test_norm

def define_model():
    model = Sequential()
    model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3))) #kernel_initializer='he_uniform', 
    model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
    model.add(MaxPooling2D((2, 2)))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(10, activation='softmax'))
    # compile model
    opt = SGD(learning_rate=0.001, momentum=0.9)
    model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

def summarize_diagnostics(history):
    # plot loss
    pyplot.subplot(211)
    pyplot.title('Cross Entropy Loss')
    pyplot.plot(history.history['loss'], color='blue', label='train')
    pyplot.plot(history.history['val_loss'], color='orange', label='test')
    # plot accuracy
    pyplot.subplot(212)
    pyplot.title('Classification Accuracy')
    pyplot.plot(history.history['accuracy'], color='blue', label='train')
    pyplot.plot(history.history['val_accuracy'], color='orange', label='test')
    pyplot.show()

def run_test_harness():
    trainX, trainY, testX, testY = load_dataset()
    trainX, testX = prep_pixels(trainX, testX)
    model = define_model()
    history = model.fit(trainX, trainY, epochs=10, batch_size=64, validation_data=(testX, testY))
    _, acc = model.evaluate(testX, testY)
    print('> %.3f' % (acc * 100.0)) #overall validation accuracy
    summarize_diagnostics(history)

tf.config.list_physical_devices()    
run_test_harness()

Your GPU has a lot less memory than your CPU does. It could also be that your data transfer is timing out. There are a couple of things to try. Use `nvidia-smi` and watch the memory used. Is it running out? Reduce your cifar10 dataset. Can you train on 10-20 examples on the GPU? Reduce your network from 32 channels to a lower number like 4. Change your batch_size lower to 4. Run and see if it works. Your training results will be bad but these tests are just to find the problem with GPU usage. Oh, try to run this from the shell, not using Spyder, which may be the cause of a timeout. — Robert Lugg, Mar 17 '22 at 19:18
Many thanks for the reply! I tried reducing cifar10 to 30 samples, batch_size of 4, and layers to: Conv2D(4, (3, 3)). This didn't make any difference. According to nvidia-smi, when I run the code, GPU memory usage goes from ~500 MB up to 3546 MB out of 4096 max. Maybe the GPU just doesn't have enough memory for this task? (I've yet to try using shell rather than Spyder.) — user18496384, Mar 17 '22 at 19:54
By default TensorFlow allocates 100% of the GPU memory (sorry I forgot that). You should disable this behavior (see https://stackoverflow.com/a/55541385/2184122). Your 1050ti appears to have 4G ram which is relatively small. Sorry I can't help more. PS: Do you know about the free Google GPUS you can use:https://colab.research.google.com/ you might have better luck there. — Robert Lugg, Mar 17 '22 at 20:55
I'm having the same issue you're describing, using TensorFlow-GPU 2.3. I checked on 2 systems (RTX 3070 8Gb and RTX 3050 4Gb) and the problem persists. At this point, I think it's not a GPU RAM problem. I didn't find a solution yet. — Eugenio Anselmino, Jan 20 '23 at 09:55

score 1 · Answer 1 · answered Apr 02 '22 at 05:52

This could be possible due to less RAM available for running this code using GPU in your system.

This code works fine when I tried replicating the same in Google colab using gpu mode, However cpu mode, takes more time to run the code comparatively:

tf.config.list_physical_devices()

Output:

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
 PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

To run the code:

run_test_harness()

Output:

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 2s 0us/step
170508288/170498071 [==============================] - 2s 0us/step
Train shape: X=(50000, 32, 32, 3), y=(50000, 1)
Test shape: X=(10000, 32, 32, 3), y=(10000, 1)
Epoch 1/10
782/782 [==============================] - 22s 10ms/step - loss: 1.9416 - accuracy: 0.3059 - val_loss: 1.7283 - val_accuracy: 0.3935
Epoch 2/10
782/782 [==============================] - 7s 9ms/step - loss: 1.6194 - accuracy: 0.4278 - val_loss: 1.5163 - val_accuracy: 0.4528
Epoch 3/10
782/782 [==============================] - 8s 10ms/step - loss: 1.4575 - accuracy: 0.4835 - val_loss: 1.3856 - val_accuracy: 0.5099
Epoch 4/10
782/782 [==============================] - 6s 8ms/step - loss: 1.3539 - accuracy: 0.5213 - val_loss: 1.3111 - val_accuracy: 0.5348
Epoch 5/10
782/782 [==============================] - 5s 7ms/step - loss: 1.2553 - accuracy: 0.5589 - val_loss: 1.2233 - val_accuracy: 0.5658
Epoch 6/10
782/782 [==============================] - 5s 6ms/step - loss: 1.1763 - accuracy: 0.5865 - val_loss: 1.1691 - val_accuracy: 0.5841
Epoch 7/10
782/782 [==============================] - 5s 6ms/step - loss: 1.1091 - accuracy: 0.6115 - val_loss: 1.1284 - val_accuracy: 0.6005
Epoch 8/10
782/782 [==============================] - 5s 6ms/step - loss: 1.0402 - accuracy: 0.6344 - val_loss: 1.0932 - val_accuracy: 0.6183
Epoch 9/10
782/782 [==============================] - 5s 6ms/step - loss: 0.9842 - accuracy: 0.6546 - val_loss: 1.0825 - val_accuracy: 0.6255
Epoch 10/10
782/782 [==============================] - 5s 6ms/step - loss: 0.9350 - accuracy: 0.6742 - val_loss: 1.0722 - val_accuracy: 0.6256
313/313 [==============================] - 1s 3ms/step - loss: 1.0722 - accuracy: 0.6256
> 62.560

Please check again in Google colab by selecting the GPU

(Runtime - Change runtime type - Hardware accelerator - GPU - Save)

and let us know if issue still persists.

How to get Keras Conv2D layers to work on GPU

1 Answers1