I tried to quantize model trained using object detection api and it successfuly worked following google document. To convert frozen graph to quantized UINT8 tflite file, I used command
tflite_convert \
--output_file="${OUTPUT_DIR}/output_tflite_graph.tflite" \
--graph_def_file="${OUTPUT_DIR}/tflite_graph.pb" \
--inference_type=QUANTIZED_UINT8 \
--input_arrays="${INPUT_TENSORS}" \
--output_arrays="${OUTPUT_TENSORS}" \
--mean_values=128 \
--std_dev_values=128 \
--input_shapes=1,300,300,3 \
--change_concat_input_ranges=false \
--allow_nudging_weights_to_use_fast_gemm_kernel=true \
--allow_custom_ops
On quantization, I set std_dev_values to 128, and std_dev_values to 128. According to another question here: Understanding tf.contrib.lite.TFLiteConverter quantization parameters This range corresponds to input image values range [-1,1]
However, during inference my inference will only work when I use input image value in range [0-255]! I am again confused about what should I set for std_dev_values and mean_values for other circumstances.
Here is my project and inference code: https://github.com/zye1996/edgetpu_ssd_lpr
I suspect object detection api has preprocess to scale input data but I cannot find references and do not know what causes the conflict. Please help!