Training on GPU much slower than on CPU - why and how to speed it up?

Question

I am training a Convolutional Neural Network using Google Colab's CPU and GPU.

This is the architecture of the network:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 62, 126, 32)       896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 31, 63, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 29, 61, 32)        9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 30, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 12, 28, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 6, 14, 64)         0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 4, 12, 64)         36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 2, 6, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 768)               0         
_________________________________________________________________
dropout (Dropout)            (None, 768)               0         
_________________________________________________________________
lambda (Lambda)              (None, 1, 768)            0         
_________________________________________________________________
dense (Dense)                (None, 1, 256)            196864    
_________________________________________________________________
dense_1 (Dense)              (None, 1, 8)              2056      
_________________________________________________________________
permute (Permute)            (None, 8, 1)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 8, 36)             72        
=================================================================
Total params: 264,560
Trainable params: 264,560
Non-trainable params: 0

So, this is a very small network but a specific output, shape (8, 36) because I want to recognize characters on an image of license plates.

I used this code to train the network:

model.fit_generator(generator=training_generator,
                    validation_data=validation_generator,
                    steps_per_epoch = num_train_samples // 128,
                    validation_steps = num_val_samples // 128,
                    epochs = 10)

The generator resizes the images to (64, 128). This is the code regarding the generator:

class DataGenerator(Sequence):

    def __init__(self, x_set, y_set, batch_size):
        self.x, self.y = x_set, y_set
        self.batch_size = batch_size

    def __len__(self):
        return math.ceil(len(self.x) / self.batch_size)

    def __getitem__(self, idx):
        batch_x = self.x[idx * self.batch_size:(idx + 1) *
        self.batch_size]
        batch_y = self.y[idx * self.batch_size:(idx + 1) *
        self.batch_size]

        return np.array([
            resize(imread(file_name), (64, 128))
               for file_name in batch_x]), np.array(batch_y)

On CPU one epoch takes 70-90 minutes. On GPU (149 Watt) it takes 5 times as long as on CPU.

Do you know, why it takes so long? Is there something wrong with the generator?
Can I speed this process up somehow?

Edit: This ist the link to my notebook: https://colab.research.google.com/drive/1ux9E8DhxPxtgaV60WUiYI2ew2s74Xrwh?usp=sharing

My data is stored in my Google Drive. The training data set contains 105 k images and the validation data set 76 k. All in all, I have 1.8 GB of data.

Should I maybe store the data at another place?

Thanks a lot!

Please share a self-contained notebook that reproduces the problem you observe. Factors important to performance, e.g., dataset location and size, are not described in the original question. If your data is in Drive, for example, it's likely you can speed things up by copying it to the local PD SSD boot disk. — Bob Smith, Jun 27 '20 at 16:23
No, I did not install tensorflow-gpu. Is that necessary? I thought, I only had to change runtime type to 'GPU'? — Tobitor, Jun 28 '20 at 11:21

score 1 · Answer 1 · answered Jun 28 '20 at 11:07

1

I think, you did not enable a GPU

Go to Edit -> Notebook Settings and choose GPU. Then click SAVE

answered Jun 28 '20 at 11:07

Oleg Ivanytskyi

959
2
12
28

Thank you! Is that necessary? In all the tutorials I did, I only saw that the runtime type was changed to GPU, no one changed the hardware accelerator to GPU. However, I changed it to GPU as you recommended but unfortunately it is not faster than before... – Tobitor Jun 28 '20 at 11:39
1

@Tobitor try checking if tensorflow is running with GPU `from tensorflow.python.client import device_lib print(device_lib.list_local_devices())` The output must show that there is a GPU available – Oleg Ivanytskyi Jun 29 '20 at 07:25
1

@Tobitor actually, here is a question about running keras models on GPU: https://stackoverflow.com/questions/45662253/can-i-run-keras-model-on-gpu. Maybe it'll help – Oleg Ivanytskyi Jun 29 '20 at 07:27
I did this, and it is running on GPU. However, it is not faster than before... – Tobitor Jun 29 '20 at 12:44
@Tobitor try adding this lines `config = tf.ConfigProto( device_count = {'GPU': 1 , 'CPU': 2} ) sess = tf.Session(config=config) keras.backend.set_session(sess)` – Oleg Ivanytskyi Jun 29 '20 at 13:10
Now, I get :`AttributeError: module 'tensorflow' has no attribute 'ConfigProto'`... – Tobitor Jun 29 '20 at 14:06
@Tobitor try using this `tf.compat.v1.ConfigProto`, because it looks like this option was removed in TensorFlow 2.0 – Oleg Ivanytskyi Jun 29 '20 at 15:35

Training on GPU much slower than on CPU - why and how to speed it up?

1 Answers1

Linked