0

I am using a pretrained VGG16 model to classify ~100 000 images on Google Colab's TPU. I am unsure what values to choose for the different parameters to optimize the runtime of Keras' model.predict_generator and flow_from_dataframe. There does not seem to be a lot of documentation. I have tried varying the queue sizes and number of workers on test runs with 'only' ~10 000 images, but I did not observe any significant changes in the runtime.

A code snippet is provided below. The pandas dataframe contains file locations of all images on a mounted drive, and ground truth labels. The model is an existing VGG16 pretrained network. (I want to compare the performance of this network to my own network later.) Some insights in best practices on what settings to use when (batch size, queue size, workers) would be very welcome!

trdata= ImageDataGenerator()
data = trdata.flow_from_dataframe(dataframe = df, directory= None, 
      x_col="Photo", y_col="Class", class_mode='categorical', color_mode="rgb", 
      batch_size=32, target_size=(224,224), shuffle=False)
predictions= model.predict_generator(data, max_queue_size=64, workers = 32, verbose=1)
NLH
  • 3
  • 2
  • What exactly are you trying to achieve? Which parameters are you unsure about? – Aniket Bote Sep 02 '20 at 01:49
  • I am using the Neural Network to classify the images. predictions is a matrix with the probabilities for each class. (I later use this to compute topk accuracy based on the ground-truth labels.) Right now the code runs, but it is very slow. I understand that it will remain a computationally heavy task, but I would like to make optimal use of the hardware and minimize the runtime. To do this there are probably smart ways to choose the batch_size, the max_queue_size and the amount of workers. (And prehaps other parameters I have not considered?) I would like some advise on how to choose these. – NLH Sep 02 '20 at 11:04

1 Answers1

0

batch_size: The accuracy of the training model depends on batch_size. So you should select batch_size which gives the best results for your specific data. As faras performance is concerned the higher batch_size will consume more memory and will provide minor speed boosts.

max_queue_size:
max_queue_size is the maximum size queue which is used to cache the samples from the generator.
CPU keeps creating batches until the queue is at the max_queue_size or reaches the stop. You want to have batches ready for your GPU to "take" so that the GPU doesn't have to wait for the CPU.

Workers:
It is the number of threads generating batches in parallel. Batches are computed in parallel on the CPU and passed on the fly onto the GPU for neural network computations If you see that your GPU is waiting for batches, try to increase the number of workers and perhaps also the queue size.

You can also refer to this.

If you want to check the performance of your GPU, you can use the Tensorboard Profiler. See here for demo example.

Aniket Bote
  • 3,456
  • 3
  • 15
  • 33
  • Did this answer your question? – Aniket Bote Sep 02 '20 at 22:32
  • Thank you! I was mostly looking for orders of magnitude for numbers to pick, but I guess that is a very tough one since it depends on so many things. The variable names are already pretty clear, but your explanation does give some extra clarity/confirmation(, and some good pointers for when I am training a neural network myself later). The tensorboard profiler seems like a very usefull tool I had not found yet, thank you for the link, I'll be sure to check it out! – NLH Sep 03 '20 at 11:47