azure machine learning workbench - High accuracy but very low confidence score

Question

Im am new to Machine Learning, so please have that in mind before answering. I came across challenge trying to train a neural network in workbench using CNTK with ResNet model. I followed this tutorial provided from azure [1] https://learn.microsoft.com/en-us/azure/machine-learning/desktop-workbench/scenario-image-classification-using-cntk

My first dataset a subset from ImageNet consisting of 900 images with 4 different classes car, bus, van and truck. Afterwards I used a subset of the dataset provided from the link underneath. [2] http://podoce.dinf.usherbrooke.ca/challenge/dataset/

I used 9000 images of the dataset divided equally into four different into the same classes as with ImageNet and started training my network.

The classifier I used for this was the DNN classifier with the following configuration:

 rf_pretrainedModelFilename = "ResNet_50.model" 
 rf_inputResoluton = 224                
 rf_dropoutRate    = 0.5                 
 rf_mbSize         = 10               
 rf_maxEpochs      = 30                
 rf_maxTrainImages = float('inf')        
 rf_lrPerMb        = [0.01] * 10 + [0.001] * 10 + [0.0001] 
 rf_momentumPerMb  = 0.9                 
 rf_l2RegWeight    = 0.0005              
 rf_boFreezeWeights      = False         
 rf_boBalanceTrainingSet = False          images

After training the model i got a overall accuracy of 96.80% with all classes having a accuracy > 92% . All well and done, but when I tested various other test images, my confidence score was 12.9895 at its highest peak. I got a JSON object returned like this: Image classified as 'Bus' with confidence score 12.9895.

     {\"score\": \"12.9895\", \"Id2Labels\": \"{0: 'Bus', 1: 'Truck', 2: ' 
  Car', 3: 'Van'}\", \"label\": \"Bus\", \"executionTimeMs\": \"128.749\", 
  \"allScores\": \"[ 12.98949814   3.51014233  -6.96435881  -6.89878178]\"}"

The value 12.9895 must mean 12.9895% possibility for the image being a bus, right? and why is it not returned as a value between 0 and 1 ? Please correct me if i am wrong, as I do get confused over the various terms being used in Machine Learning for the same thing.
Why are the minus values there, I thought the activation function took care of the minus values?
Should I include a even larger dataset or maybe better image quality to improve my score?
Any other suggestions to how I can improve my score?

The score where low on both dataset mentioned, (Subset from ImageNet and MIO). A humble thank you, for taking the time answering these questions.

Ashutosh · Answer 1 · 2018-07-31T03:17:02.603

Scoring is also called prediction, and is the process of generating values based on a trained machine learning model, given some new input data. The values or scores that are created can represent predictions of future values, but they might also represent a likely category or outcome. The meaning of the score depends on the type of data you provide, and the type of model that you created.

Score can't be returned as 1 and 0 because, as per the data you have provided it is 12.9895% sure that this is a bus. So, you have to write your code to return values as 1 and 0.

Read Here more about Score.

For activation, you must use ReLU activation function. The Rectified Linear Unit (ReLU) function, which sets negative values to 0 and leaves others alone.

Here is one example of the implementation.

For image of bus, you can try putting different images of bus or other vehicles, the better the quality, the more score you generate, but that will depend on the logic. What defines it as bus or truck in your code.

Have you tried this?

Hi Is just when you run the **showResults.py** in their Notebook server you get score between 0 and 1 for the same network. [link]https://learn.microsoft.com/en-us/azure/machine-learning/desktop-workbench/scenario-image-classification-using-cntk After deploying it to the web service the scores are changed to allow higher values. Its just confusing. The score model seems to only apply for azure ml studio. About the definition of the bus or truck in the code, the tutorial only describes the configuration in the Hyperparameter. — ISeeSharp, Jul 31 '18 at 08:13

mewahl · Answer 2 · 2018-08-01T16:10:58.137

I think there's a miscommunication about what the allScores array means. The array contains the raw outputs of the final classification layer of the network, but they do not represent probabilities and can take on any real value, even negative ones as you saw. To convert these values to probabilities, you'd apply the softmax function to the array, which would give you the following values:

[9.99923588e-01, 7.64073070e-05, 2.15832503e-09, 2.30460548e-09]

The Id2Labels field tells you that the 0th index/first element in this array corresponds to the "bus" class. Therefore, your model is predicting that this image is a bus with >99.99% probability. The next most probable label is "Truck" (second element of the array).

It would have been handier if deploymain.py had been written to apply softmax to the raw scores before returning the results, though I suppose there's nothing technically incorrect with the current implementation.

Hey Sorry for the long awaited answer. Im not quite sure how a score of 12.98949814 for bus can be converted to 9.99923588e-01? Still I get other label truck for car images. So something is more up then just adding softmax. What is difference between probabilities and the prediction scores? Thank you kindly for your time answering — ISeeSharp, Aug 25 '18 at 16:52
I figured it out. They even used softmax as default in showResult.py . Funny thing they didn't use it for the score array and you can hardly find any documentation about the issue — ISeeSharp, Aug 25 '18 at 20:40

azure machine learning workbench - High accuracy but very low confidence score

2 Answers2