1

I am brand new to Tensorflow and trying to incorporate it into my project. I am using a raspberry pi4 with a camera to detect change from my bird feeder, take a picture, and then identify that bird with tensorflow. I am using the Birds_V1 to accomplish this.

I am curious as to how to interpret the input and output. According to the overview the input is:

expected to be 3-channel RGB color images of size 224 x 224, scaled to [0, 1].

I am confused by what is meant by scaled to [0,1] Furthermore, the output

image_classifier: A probability vector of dimension 965, corresponding to a background class and 964 bird species in the labelmap.

I am completely lost by what is meant here.

Lastly, I ran interpreter.get_input_details() and interpreter.get_output_details() to see what it would output to me.

I got this printed back, with the top being the input and bottom the output:

[{'name': 'module/hub_input/images_uint8', 'index': 170, 'shape': array([  1, 224, 224,   3]), 'dtype': <class 'numpy.uint8'>, 'quantization': (0.0078125, 128), 'quantization_parameters': {'scales': array([0.0078125], dtype=float32), 'zero_points': array([128]), 'quantized_dimension': 0}}]

[{'name': 'module/prediction', 'index': 171, 'shape': array([  1, 965]), 'dtype': <class 'numpy.uint8'>, 'quantization': (0.00390625, 0), 'quantization_parameters': {'scales': array([0.00390625], dtype=float32), 'zero_points': array([0]), 'quantized_dimension': 0}}]

I don't know how to interpret this, and was wondering if I should be going off of this instead or ignoring it if its not important.

I appreciate any clarification that can be given to any part of this, and appreciate any sources that you may deem useful for solving this. I have looked for focumentation for help, but haven't gotten anywhere yet.

Caleb Renfroe
  • 183
  • 1
  • 13
  • Hi, think of scaling as a mathematical operation to bring the values into the range [0,1]. For example MinMaxScaler (subtract minimum from a value and divide by the difference between the minimum and maximum). Each training output would give a probability vector of size 965 that an image A, belongs to any of the 965 classes. Then you pick the highest probability value as the true bird species. For example in a 5 species model output1 = [0.1,0.1,0.6, 0.1, 0.1] corresponding to [A,B,C,D,E] species respectively. This would mean the input image is classified as species C(highest value 0.6). – smile Jul 06 '20 at 18:44
  • @smile Hi, thanks for commenting. So if I'm understanding what you're saying, I need to use some function like MInMaxScaler to scale the probability value for each of the potential 965 labels to be some number between 0 and 1, with all of the probabilities totaling up to 1? – Caleb Renfroe Jul 06 '20 at 19:18
  • You scale the input (the images). The probability vector are what the model returns. You do not need to scale that. The probability vector tells you in probabilistic terms what species the input belongs. I hope this helps. – smile Jul 06 '20 at 19:48
  • Ya, that makes sense. Thank you, I appreciate your help. I at least understand that part of my question now! – Caleb Renfroe Jul 06 '20 at 21:36
  • To the last part of your question. First you are using pretrained model from Tensorhub https://blog.tensorflow.org/2018/03/introducing-tensorflow-hub-library.html . On the tensorhub page, you can browse available modules (your example is from classification examples). One could observe your example is also a TFLite pre-trained module, so this https://www.tensorflow.org/lite would be particularly useful. This https://www.tensorflow.org/lite/models/image_classification/overview contains image classification details for TFLite. The links contain lots of info. Explore them. Best regards – smile Jul 07 '20 at 03:26
  • @smile Hi again. I have been looking at the tensorflow.org/lite docs. I'll give the 2nd link you send a try as well, as it seems very helpful for my topic. Thanks for the help again! – Caleb Renfroe Jul 07 '20 at 19:17
  • I am glad it helped. Best regards ! – smile Jul 07 '20 at 19:21

1 Answers1

1

Most of the questions are answered by @smile in the comments but providing the Clarifications here (Answer Section) for the benefit of the Community.

Question: I am confused by what is meant by scaled to [0,1]

Answer: Usually, the Images are nothing but a Numpy Array (in your case of shape 224, 224, 3) whose values range from 0 to 255. We Normalize the Pixel Values by dividing each Pixel Value by 255 so that value of every Pixel will be in range [0,1]. If we don't Normalize the Pixel Values, time taken for the Model to Converge will be very high.

For more information regarding Normalization please refer this Stack Overflow Answer and this Stack Exchange Answer.

Question: A probability vector of dimension 965, corresponding to a background class and 964 bird species in the label map. I am completely lost by what is meant here.

Answer: In the Final Layer of our CNN, we will be using a Softmax Activation Function with the Number of Units equal to Number of Classes (in your case, its value is 965). So, the Output of that Layer will result in the 965 Probabilities with the sum of all the Probabilities being 1. The Class with highest Probability represents the Class corresponding to that Image.

For example in a 5 species model output = [0.1,0.1,0.6, 0.1, 0.1] corresponding to [A,B,C,D,E] species respectively. This would mean the input image is classified as species C(highest value 0.6). (have taken @smile's example, as it is explained well).

Question: I don't know how to interpret interpreter.get_input_details and interpreter.get_output_details

Answer: The Source Code of get_input_details explains that it returns A list of all the details of our Input Image Tensor. Similarly, the Source code of get_output_details explains that it returns A list of all the details of our Output Prediction Tensor. Information about the API, Interpreter can be found in this Tensorflow Documentation.

So in the code,

[{'name': 'module/hub_input/images_uint8', 'index': 170, 

'shape': array([  1, 224, 224,   3]), 'dtype': <class 'numpy.uint8'>, 

'quantization': (0.0078125, 128), 'quantization_parameters': {'scales': 

array([0.0078125], dtype=float32), 'zero_points': array([128]), 

'quantized_dimension': 0}}]

Name indicates the Name of the Input Tensor in the Graph, Shape indicates its shape. The purpose of Quantization is to reduce the Size of the Model as the Memory of Mobile Devices will be less. More Information about the Quantization can be found in this Tensorflow Documentation.