I'm working on a feature extractor for this transfer learning personal project, and the predict function of Kera's VGG16 model seems pretty slow (31 seconds for a batch of 4 images). I do expect it to be slow, but not sure if the prediction function is slower than it should be.
data = DataGenerator()
data = data.from_csv(csv_path=csv_file,
img_dir=img_folder,
batch_size=batch)
#####################################################
conv_base = VGG16(include_top=False,
weights='imagenet',
input_shape=(480, 640, 3))
model = Sequential()
model.add(conv_base)
model.add(MaxPooling2D(pool_size=(3, 4)))
model.add(Flatten())
######################################################
for inputs, y in data:
feature_batch = model.predict(inputs)
yield feature_batch, y
So, my hunch is that it is slow for these reasons:
- my input data is a bit large (loading in (480, 640, 3) size images)
- I am running on a weak CPU (M3-6Y30 @ 0.90GHz)
- I have a flatten operation at the end of the feature extractor.
Things I've tried:
- Other StackOverFlow posts suggested adding a max pooling layer to reduce the feature size / remove the extraneous zero's. I made I think a pretty large max pool window (thus reducing the feature size significantly, but my prediction time increased.
- Batch processing doesn't improve time which is probably obvious due to the use of my M3 CPU). A batch size of 1 image takes 8 seconds, a batch size of 4 takes 32.
Are there any ideas on how to speed up the prediction function? I need to run this through at least 10,000 images, and due to the nature of the project I would like to retain as much of the raw data as possible before going into the model (will be comparing it with other feature extraction models)
All my image files are saved locally, but I can try to setup a cloud computer and move my code over there to run with GPU support.
Is the issue simply I am running the VGG16 model on a dinky CPU?
Guidance would be much appreciated.