I am using a pretrained VGG16 model to classify ~100 000 images on Google Colab's TPU. I am unsure what values to choose for the different parameters to optimize the runtime of Keras' model.predict_generator
and flow_from_dataframe
. There does not seem to be a lot of documentation. I have tried varying the queue sizes and number of workers on test runs with 'only' ~10 000 images, but I did not observe any significant changes in the runtime.
A code snippet is provided below. The pandas dataframe contains file locations of all images on a mounted drive, and ground truth labels. The model is an existing VGG16 pretrained network. (I want to compare the performance of this network to my own network later.) Some insights in best practices on what settings to use when (batch size, queue size, workers
) would be very welcome!
trdata= ImageDataGenerator()
data = trdata.flow_from_dataframe(dataframe = df, directory= None,
x_col="Photo", y_col="Class", class_mode='categorical', color_mode="rgb",
batch_size=32, target_size=(224,224), shuffle=False)
predictions= model.predict_generator(data, max_queue_size=64, workers = 32, verbose=1)