How To Use The Latest MobileNet (v3) for Object Detection?

Question

I've been trying to use the latest MobileNet, MobileNet_v3, to run object detection. You can find Google's pre-trained models for this such as the one I'm trying to use, "ssd_mobilenet_v3_large_coco", from here: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md

I don't know how these new models take image data input, and I can't find any in-depth documentation about this online. The following java code summarizes how I'm attempting to feed the model (specifically the .tflite model using TensorFlow Lite) image data from the limited amount I can gather online, but the model only returns prediction confidences of order 10^-20, so it never actually recognizes anything. I figure from this that I must be doing this wrong.

//Note that the model takes a 320 x 320 image


//Get image data as integer values
private int[] intValues;
intValues = new int[320 * 320];
private Bitmap croppedBitmap = null;
croppedBitmap = Bitmap.createBitmap(320, 320, Config.ARGB_8888);
croppedBitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight());

//create ByteBuffer as input for running ssd_mobilenet_v3
private ByteBuffer imgData;
imgData = ByteBuffer.allocateDirect(320 * 320 * 3);
imgData.order(ByteOrder.nativeOrder());

//fill Bytebuffer
//Note that & 0xFF is for just getting the last 8 bits, which converts to RGB values here
imgData.rewind();
for (int i = 0; i < inputSize; ++i) {
  for (int j = 0; j < inputSize; ++j) {
    int pixelValue = intValues[i * inputSize + j];
    // Quantized model
    imgData.put((byte) ((pixelValue >> 16) & 0xFF));
    imgData.put((byte) ((pixelValue >> 8) & 0xFF));
    imgData.put((byte) (pixelValue & 0xFF));
  }
}

// Set up output buffers
private float[][][] output0;
private float[][][][] output1;
output0 = new float[1][2034][91];
output1 = new float[1][2034][1][4];

//Create input HashMap and run the model
Object[] inputArray = {imgData};
Map<Integer, Object> outputMap = new HashMap<>();
outputMap.put(0, output0);
outputMap.put(1, output1);
tfLite.runForMultipleInputsOutputs(inputArray, outputMap);

//Examine Confidences to see if any significant detentions were made
for (int i = 0; i < 2034; i++) {
  for (int j = 0; j < 91; j++) {
    System.out.println(output0[0][i][j]);
  }
}

score 4 · Answer 1 · answered Jan 15 '20 at 08:03

I've figured out how to get this to sort of work with a little extra effort.

You have to download the pre-trained models and re-create the .tflite files yourself to get them to work with the provided android code. The following guide, written by the Tensorflow team, shows you how to recreate the .tflite files such that they will have the same input / output format as the android object detection code accepts:

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md

This way you have to change almost none of the android code provided for object detection. The only thing you need to manually specify (both when creating the .tflite file and in the android code for the object detection) is the resolution of the object detection model.

So, for mobilenet_v3 with a resolution of 320x320, when converting the model to a .tflite file, use the flag "--input_shapes=1,320,320,3". Then, in the android code set the variable "TF_OD_API_INPUT_SIZE = 320". Those are the only changes you should need to make.

This theoretically works with any of (and only) the ssd models, but I have currently only tested it with mobilenet_v2 because it was easier to get working and the differences between v2 and v3 are negligable.

Any idea why the .tflite model from the zoo doesn't seem to work? I have the same issue of only gettig 10^-15 scores but the input should also be a quantized 320x320 rgb buffer, right? — 000000000000000000000, Apr 08 '20 at 09:53

How To Use The Latest MobileNet (v3) for Object Detection?

1 Answers1