Keras VGG16 predict speed slow

Question

I'm working on a feature extractor for this transfer learning personal project, and the predict function of Kera's VGG16 model seems pretty slow (31 seconds for a batch of 4 images). I do expect it to be slow, but not sure if the prediction function is slower than it should be.

data = DataGenerator() 
data = data.from_csv(csv_path=csv_file,
                     img_dir=img_folder,
                     batch_size=batch)

#####################################################
conv_base = VGG16(include_top=False, 
                  weights='imagenet', 
                  input_shape=(480, 640, 3))

model = Sequential()
model.add(conv_base)
model.add(MaxPooling2D(pool_size=(3, 4)))
model.add(Flatten())
######################################################

for inputs, y in data:
    feature_batch = model.predict(inputs)

    yield feature_batch, y

So, my hunch is that it is slow for these reasons:

my input data is a bit large (loading in (480, 640, 3) size images)
I am running on a weak CPU (M3-6Y30 @ 0.90GHz)
I have a flatten operation at the end of the feature extractor.

Things I've tried:

Other StackOverFlow posts suggested adding a max pooling layer to reduce the feature size / remove the extraneous zero's. I made I think a pretty large max pool window (thus reducing the feature size significantly, but my prediction time increased.
Batch processing doesn't improve time which is probably obvious due to the use of my M3 CPU). A batch size of 1 image takes 8 seconds, a batch size of 4 takes 32.

Are there any ideas on how to speed up the prediction function? I need to run this through at least 10,000 images, and due to the nature of the project I would like to retain as much of the raw data as possible before going into the model (will be comparing it with other feature extraction models)

All my image files are saved locally, but I can try to setup a cloud computer and move my code over there to run with GPU support.

Is the issue simply I am running the VGG16 model on a dinky CPU?

Guidance would be much appreciated.

0.9GHz is very, very, very weak – OptimusCrime Oct 14 '17 at 14:31 — OptimusCrime, Oct 14 '17 at 14:31

score 5 · Accepted Answer · answered Oct 14 '17 at 14:28

There are many issues with your model. The main issue is of course really slow machine, but as you cannot change that here I will state some pieces of advice on how you could speed up your computations:

VGG16 is relatively old architecture. The main issue here is that the so-called volume of tensors (area of feature maps times number of features) is decreased really slowly. I would advise you to use more modern architectures like e.g. ResNet50 or Inception v3 as they have the so-called stem which is making inside tensors much smaller really fast. Your speed should benefit thanks to that. There is also a really light architecture called MobileNet which seems perfect for your task.
Downsample your images - with a size of (480, 640) your image is 6 times bigger than default VGG input. This makes all computations 6 times slower. You could try to first downsample images and then use a feature extractor.

This is useful advice, and does help, so thanks a lot! I also realized when I ran my model from the console (instead of Python Notebook)I got a warning about setting up SSE instructions with my machine to speed up computation for Tensorflow running on a CPU. Here is a related StackOverFlow link for anyone else interested: https://stackoverflow.com/questions/43134753/tensorflow-wasnt-compiled-to-use-sse-etc-instructions-but-these-are-availab#43135194 — Joshua Zastrow, Oct 29 '17 at 13:09

score 0 · Answer 2 · answered Aug 03 '22 at 13:59

VGG16 is a very big model. The same accuracy could be reached with modern smaller models such as MobileNetV3 or EfficientNet.

However, if you have to use your model you could try OpenVINO. OpenVINO is optimized for Intel hardware but it should work with any CPU. It optimizes your model by converting to Intermediate Representation (IR), performing graph pruning and fusing some operations into others while preserving accuracy. Then it uses vectorization in runtime.

Here are some performance benchmarks for various models and CPUs. Your processor (M3-6Y30) is 6th generation so it should be supported.

It's rather straightforward to convert the Keras model to OpenVINO unless you have fancy custom layers. The full tutorial on how to do it can be found here. Some snippets below.

Install OpenVINO

The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.

pip install openvino-dev[tensorflow2]

Save your model as SavedModel

OpenVINO is not able to convert HDF5 model, so you have to save it as SavedModel first.

import tensorflow as tf
from custom_layer import CustomLayer
model = tf.keras.models.load_model('model.h5', custom_objects={'CustomLayer': CustomLayer})
tf.saved_model.save(model, 'model')

Use Model Optimizer to convert SavedModel model

The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, which is a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (just change data_type). Run in the command line:

mo --saved_model_dir "model" --input_shape "[1, 3, 224, 224]" --data_type FP32 --output_dir "model_ir"

Run the inference

The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what is the best choice for you, just use AUTO.

# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")

# Get output layer
output_layer_ir = compiled_model_ir.output(0)

# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]

Disclaimer: I work on OpenVINO.

Please don't put OpenVINO install instructions into answer to all 4 year old questions about NN performance if they wern't asking about OpenVINO specifically. Thanks for the disclaimer, though. And otherwise no objections, at least not from me, to plug something if it's relevant to the question. Just... link to the guide, I guess. — maxy, Aug 03 '22 at 15:47
The recommendation of the Stack Overflow team is to answer old questions as well (see also https://meta.stackexchange.com/questions/23996/does-it-make-sense-to-answer-old-questions). I'd like to inform people about the possibility to improve the performance. How can they ask about OpenVINO if they don't know it? The link itself is not the right answer because it may be 404 in the future, so I wanted to put some code so that it remains. — dragon7, Aug 04 '22 at 09:38
At a glance all your recent answers seem to be a plug for OpenVINO, so I'm allowed to frown upon this according to https://meta.stackexchange.com/questions/57497/limits-for-self-promotion-in-answers/59302#59302 ;-) — maxy, Aug 04 '22 at 18:33
That's my role to inform people about OpenVINO. I'm not trying to sell it because it's free and open-source. I'll look over these rules. Thanks! — dragon7, Aug 05 '22 at 10:24

Keras VGG16 predict speed slow

2 Answers2