GPU getting slow in keras and tensorflow after Ubuntu upgrade to 18.04

Question

After upgrading my notebook's operating system to Ubuntu 18.04 from 16.04 I noticed how my keras code (using tensorflow backend) became incredibly slow in my conda environment where I had tensorflow-gpu installed. Basically, it seems like some simple CNN models takes now forever (like if they had been using CPU instead) to train, even though a simple inspection via nvidia-smi command reveals a python process being engaged by the GPU (Nvidia GeForce GTX1070). I then thought about updating CUDA libraries (from version 7 to version 9) and accordingly updating the CUDnn to be compatible with the new version. I also updated the tensorflow-gpu and keras packages to the latest version, but still it appears to run ways slower than in my previous setup. To show an example, here's a fragment of code that I am running, with the model being the following:

model= Sequential()
model.add(Conv2D(32,(3,3), activation='relu', input_shape=(100,60,3)))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(64,(3,3), activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(128,(3,3), activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Conv2D(128,(3,3), activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(512, activation='relu'))
model.add(Dense(26, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

which would produce its output after few seconds:

history = model.fit_generator(train_generator, steps_per_epoch=260000 // 32, 
                              epochs=10, 
                              validation_data=test_generator, 
                              validation_steps=52000//32,
                              verbose=2
                             )

Epoch 1/10
166s - loss: 0.2333 - acc: 0.9291 - val_loss: 0.0073 - val_acc: 0.9982

now each epoch takes a very long amount of time (more than 45 minutes instead of 166 seconds!). Anyone has an idea why this is happening? Do I need to revert to Ubuntu 16.04? Am pretty upset about this behaviour...

edited....

I tried to run a performance test between cpu and gpu using the models found in https://medium.com/@andriylazorenko/tensorflow-performance-test-cpu-vs-gpu-79fcd39170c and my gpu seems to work well: over 10000 samples processed per second on average in each epoch, vs around 400 samples per second when run in CPU mode. HOWEVER, my code (written in keras) still produces a weird behaviour. This is the ETA expected to finish (I never let it finish tho, since it would take hours) an epoch under GPU after my ubuntu update:

Epoch 1/1

6/507 [..............................] - ETA: 5:19:17 - loss: 3.2632 - acc: 0.0397

while this is the same output produced with keras in a normal CPU environment using always tensorflow as backend

Epoch 1/1
  4/507 [..............................] - ETA: 4850s - loss: 3.2671 - acc: 0.0293

there's someething wrong going on in keras, apparently....

I assume Ubuntu has installed a newer driver for your graphics card. I had this problem on an ubuntu machine when I ran an update. So there's a mismatch between graphics driver and CUDA library versions. Look at [this answer](https://stackoverflow.com/questions/38009682/how-to-tell-if-tensorflow-is-using-gpu-acceleration-from-inside-python-shell) to see if your GPU is even being used. I suspect it isn't. I fixed my machine by reinstalling the NVIDIA driver I was using prior to the upgrade. — T Burgis, Sep 23 '18 at 08:02
thanks, I believe it is using my GPU, since I did some testing like assigning a gpu device to tensorflow session and doing some calculations which produced the expected output. I will try reinstalling a driver and see what I get though. — mspadaccino, Sep 23 '18 at 08:13
just reinstalled the driver but the problem seems to persist, but only when using keras. I ran a test using straight tensorflow and it doesnt seem to be slowed down, whilst keras under gpu seems stuck taking even longer than when using it only on CPU.... this is really strange... — mspadaccino, Sep 23 '18 at 09:43
Cuda 9 does not support ubuntu 18.04. Install CUDA 10 ( it was released today) or downgrade to ubuntu 16.04 — joão gabriel s.f., Sep 23 '18 at 16:05

GPU getting slow in keras and tensorflow after Ubuntu upgrade to 18.04

0 Answers0