I've been playing around with creating convolutional neural networks using keras. I've gotten some decent results but training on my laptop can train hours, so figured I could speed things up using a GPU instance at AWS. I spun up a g3dn.2xlarge and assumed I would see training fly through. Instead, I see steps moving as slow as they did on my laptop.
I used the following tutorial to setup my instance and start Jupyter notebook server: https://aws.amazon.com/getting-started/hands-on/get-started-dlami/
After opening jupyter in my browser, I set it to conda_tensorflow2_p36 environment. After importing tensorflow/keras libraries, I have a line like this:
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
It confirms that there is 1 GPU available. I don't think what I have going is particularly big, my training set is 42950 images belonging to 859 classes, my validation set is 21463 images belonging to 859 classes. So more or less 50 images per class, which I know is not a lot but I'm getting accurate enough results on my PC so not worried about that. Batch size is 25 for training set and 10 for validation set. Independent of how the convolutional neural network is architected, I would assume it would run faster on the EC2 instance than on my crappy laptop. What could I be doing wrong here?
Update 1 I'm running the notebook on an EBS volume and the data is there as well, could that be a problem?
Update 2
I've added the following lines to test:
print(device_lib.list_local_devices())
import keras.backend.tensorflow_backend as tfback
tfback._get_available_gpus()
Output is this (seems to say GPU is being used):
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 15554497242449630399
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 10944104228818506418
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 6119898251220801089
physical_device_desc: "device: XLA_GPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 14949928141
locality {
bus_id: 1
links {
}
}
incarnation: 6277213019374881773
physical_device_desc: "device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5"
]
['/device:GPU:0']