Description of TF Lite's Toco converter args for quantization aware training

Question

These days I am trying to track down an error concerning the deployment of a TF model with TPU support.

I can get a model without TPU support running, but as soon as I enable quantization, I get lost.

I am in the following situation:

Created a model and trained it
Created an eval graph of the model
Froze the model and saved the result as protocol buffer
Successfully converted and deployed it without TPU support

For the last point, I used the TFLiteConverter's Python API. The script that produces a functional tflite model is

import tensorflow as tf

graph_def_file = 'frozen_model.pb'
inputs = ['dense_input']
outputs = ['dense/BiasAdd']

converter = tf.lite.TFLiteConverter.from_frozen_graph(graph_def_file, inputs, outputs)
converter.inference_type = tf.lite.constants.FLOAT
input_arrays = converter.get_input_arrays()

converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]

tflite_model = converter.convert()

open('model.tflite', 'wb').write(tflite_model)

This tells me that my approach seems to be ok up to this point. Now, if I want to utilize the Coral TPU stick, I have to quantize my model (I took that into account during training). All I have to do is to modify my converter script. I figured that I have to change it to

import tensorflow as tf

graph_def_file = 'frozen_model.pb'
inputs = ['dense_input']
outputs = ['dense/BiasAdd']

converter = tf.lite.TFLiteConverter.from_frozen_graph(graph_def_file, inputs, outputs)
converter.inference_type = tf.lite.constants.QUANTIZED_UINT8      ## Indicates TPU compatibility
input_arrays = converter.get_input_arrays()

converter.quantized_input_stats = {input_arrays[0]: (0., 1.)}     ## mean, std_dev
converter.default_ranges_stats = (-128, 127)                      ## min, max values for quantization (?)
converter.allow_custom_ops = True                                 ## not sure if this is needed

## REMOVED THE OPTIMIZATIONS ALTOGETHER TO MAKE IT WORK

tflite_model = converter.convert()

open('model.tflite', 'wb').write(tflite_model)

This tflite model produces results when loaded with the Python API of the interpreter, but I am not able to understand their meaning. Also, there is no (or if there is, it is hidden well) documentation on how to choose mean, std_dev and the min/max ranges. Also, after compiling this with the edgetpu_compiler and deploying it (loading it with the C++ API), I receive an error:

INFO: Initialized TensorFlow Lite runtime.
ERROR: Failed to prepare for TPU. generic::failed_precondition: Custom op already assigned to a different TPU.
ERROR: Node number 0 (edgetpu-custom-op) failed to prepare.

Segmentation fault

I suppose I missed a flag or something during the conversion process. But as the documentation is also lacking here, I can't say for sure.

In short:

What do the params mean, std_dev, min/max do and how do they interact?
What am I doing wrong during the conversion?

I am grateful for any help or guidance!

EDIT: I have opened a github issue with the full test code. Feel free to play around with this.

Will probably explain them later, but in my experience, post-quantization is not really good and can only be used to see the model's performance after quantization. To squeeze the most out of the quantization routine, you need to perform **quantization-aware training**. — Chan Kha Vu, Jul 23 '19 at 00:38
@FalconUA: I thought that I performed quantization-aware training (see the github link). If you decide to write an answer, maybe you could explain the main differences between post-training quantization and quantization-aware training, as I am new to this issue. That would be great! — DocDriven, Jul 23 '19 at 08:40
See https://medium.com/tensorflow/tensorflow-model-optimization-toolkit-post-training-integer-quantization-b4964a1ea9ba — Alex Cohn, Jul 24 '19 at 13:45
This example might help: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tutorials/post_training_integer_quant.ipynb — mdaoust, Jul 25 '19 at 13:18
see https://stackoverflow.com/a/58096430/834565 for explanation of mean and stddev — MohamedEzz, Nov 09 '19 at 23:55

mdaoust · Answer 1 · 2019-12-24T13:52:39.503

You should never need to manually set the quantization stats.

Have you tried the post-training-quantization tutorials?

https://www.tensorflow.org/lite/performance/post_training_integer_quant

Basically they set the quantization options:

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

Then they pass a "representative dataset" to the converter, so that the converter can run the model a few batches to gather the necessary statistics:

def representative_data_gen():
  for input_value in mnist_ds.take(100):
    yield [input_value]

converter.representative_dataset = representative_data_gen

While there are options for quantized training, it's always easier to to do post-training quantization.

Description of TF Lite's Toco converter args for quantization aware training

1 Answers1