I'm quite new in deep learning and, in order to improve my knowledge, I've been reading some books and following a video course on line. In this videocourse I have to do an exercise with convolution neaural network. I've builded a CNN with 10.000 images with dimension 64x64 pixels. (to recognize cats and dogs images)
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
# Initialising the CNN
classifier = Sequential()
# Step 1 - Convolution
classifier.add(Convolution2D(32,3,3,input_shape=(64,64,3),activation='relu'))
# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size = (2,2)))
classifier.add(Convolution2D(32,3,3,activation='relu'))
classifier.add(MaxPooling2D(pool_size = (2,2)))
# Step 3 - Flattening
classifier.add(Flatten())
#step 4 - Full Connection CNN
classifier.add(Dense(output_dim = 128 ,activation='relu'))
classifier.add(Dense(output_dim = 1 ,activation='sigmoid'))
# Compiling the CNN
classifier.compile(optimizer = 'adam' , loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting the CNN to the images
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
traininig_set = train_datagen.flow_from_directory(
'dataset/training_set',
target_size=(64, 64),
batch_size=32,
class_mode='binary')
test_set = test_datagen.flow_from_directory(
'dataset/test_set',
target_size=(64, 64),
batch_size=32,
class_mode='binary')
classifier.fit_generator(traininig_set,
steps_per_epoch=8000,
epochs=25,
validation_data=test_set,
validation_steps=2000)
The first time I installed Anaconda I didn't install the GPU module and when I started fitting my CNN I had to wait 1190 seconds per epoch with the CPU working at 70%. For your information my computer is quite fast. It's an i7 6800k overclocked to 4.2ghz an MSI GTX1080 video cards and 32gb 3333Mhz. I've tought that with this computer installing the tensorflow gpu module was almost compulsory.
I watched in some posts how to check if the tensorflow is correctly configured to use GPU and launching:
In [1]: from tensorflow.python.client import device_lib
In [2]: print(device_lib.list_local_devices())
I have this result:
2017-10-16 10:41:25.780983: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-16 10:41:25.781067: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-16 10:41:26.635590: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:955] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.8225
pciBusID 0000:03:00.0
Total memory: 8.00GiB
Free memory: 6.61GiB
2017-10-16 10:41:26.635807: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:976] DMA: 0
2017-10-16 10:41:26.636324: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:986] 0: Y
2017-10-16 10:41:26.637179: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0)
[name: "/cpu:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 16495731140373557390
, name: "/gpu:0"
device_type: "GPU"
memory_limit: 6740156088
locality {
bus_id: 1
}
incarnation: 6266244792178813148
physical_device_desc: "device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0"
]
With gpu:0, I read in the documentation that TensorFlow automatically will use GPU for computation.
Launching the fit method with this configuration I have to wait 950 sec per epoch, well better than 1190 seconds. The cpu never gets over 10% and, strangely, the GPU never gets over 10-13%. I assume there is something wrong with my configuration because, the teacher in the course, with a MacBook notebook (I don't know the exact configuration actually) without tensorflow GPU module takes approximately 90 seconds per epoch.
I'm not a python or tensorflow expert, but it really seems there is something wrong or something else to understand.
Could someone give some advice, something to read, some tests to do to understand better where is the bottleneck? Thank you