Implementing TensorFlow Attention OCR on iOS

Question

I have successfully trained (using Inception V3 weights as initialization) the Attention OCR model described here: https://github.com/tensorflow/models/tree/master/attention_ocr and frozen the resulting checkpoint files into a graph. How can this network be implemented using the C++ API on iOS?

Thank you in advance.

Are the instructions in http://machinethink.net/blog/tensorflow-on-ios/ a good starting point? — Alexandre Passos, Jul 10 '17 at 16:21

score 2 · Accepted Answer · answered Jul 11 '17 at 18:38

As suggested by others you can use some existing iOS demos (1, 2) as a starting point, but pay close attention to the following details:

Make sure you use the right tools to "freeze" the model. The SavedModel is a universal serialization format for Tensorflow models.
An model export script can and usually do some kind of input normalization. Note that the Model.create_base function expects a tf.float32 tensor of shape [batch_size, height, width, channels] with values normalized to [-1.25, 1.25]. If you do image normalization as part of the TensorFlow computation graph, make sure images are passed unnormalized and vise versa.

To get names of input/output tensors you can simply print them, e.g. somewhere in your export script:

data_images = tf.placeholder(dtype=tf.float32, shape=[batch_size, height, width, channels], name='normalized_input_images')
endpoints = model.create_base(data_images, labels_one_hot=None)
print(data_images, endpoints.predicted_chars, endpoints.predicted_scores)

Thank you for your response, Alexander. I am, however, slightly confused as to how I should feed the input image into the network. I see that the 'split' node accepts a tensor of size 32x150x600x3, which is then sent split into four tensors and fed into Inception feature extractors. Assuming I have a single input image, which node should I use as my input? Additionally, how would I get around a batch size of 32 with a single input image? — Michael Royzen, Jul 16 '17 at 02:53
If you have a single view, specify num_views=1 as the argument for model constructor. So you can still use the code snippet from the answer and feed your image into the data_images tensor (the print statement will show you the name). But please note that you need to use the same number of views which where used for the training. If you used 4 views and need to test with a single, you will need to pad you single 1500x150 view with random noise, similar as it was done for training data. — Alexander Gorban, Jul 17 '17 at 18:52

Implementing TensorFlow Attention OCR on iOS

1 Answers1

Linked